Skip to main navigation Skip to main content
  • KSBS
  • E-Submission

Plant Breed. Biotech. : Plant Breeding and Biotechnology

OPEN ACCESS
ABOUT
BROWSE ARTICLES
EDITORIAL POLICIES
FOR CONTRIBUTORS

Articles

Research Article

Positively Selected Orthologous Genes Identified in Sesame (Sesamum indicum) by Deep Resequencing

Plant Breeding and Biotechnology 2019;7(1):24-33.
Published online: March 1, 2019

1Department of Plant Resources, College of Industrial Science, Kongju National University, Yesan 32439, Korea

2State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei 230036, China

3Center of Crop Breeding on Omics and Artificial Intelligence, Kongju National University, Yesan 32439, Korea

*Corresponding author: Kyu-Won Kim, sh.kyuwon@gmail.com, Tel: +82-41-330-1213, Fax: +82-41-330-1209
*Yong-Jin Park, yjpark@kongju.ac.kr, Tel: +82-41-330-1201, Fax: +82-41-330-1209
• Received: January 21, 2019   • Revised: February 20, 2019   • Accepted: February 20, 2019

Copyright © 2019 The Korean Society of Breeding Science

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

  • 3 Views
  • 0 Download
prev next
  • Sesame (Sesamum indicum L.) is the queen of oil seed crops and is cultivated widely in tropical and subtropical areas. The availability of the sesame genome sequence presents unprecedented opportunities for studying its genetics, genomics, and evolution. In this report, we conducted a genome resequencing-based identification of sesame orthologs; in total, 26,379 coding sequences (CDSs) were isolated. Using a reciprocal best hit, we ultimately identified a total of 639 orthologs sets after one-to-one orthologs extraction across seven Pentapetalae plant species. These orthologs were considered to be the most credible between the two species, and in sesame. Furthermore, we performed a branch model-based maximum likelihood estimation of dN/dS of the orthologs, resulting in the identification of 198 evolutionarily accelerated orthologs and 66 positively selected genes (P-value and FDR < 0.05). An enrichment analysis and protein interaction network suggested 19 genes with important functions of the orthologs specific in sesame development and domestication. The method we used here provides a case study for identifying orthologous genes between sesame and other plants species that are distributed in equilibrium phylogenetically, which can be used in other plants.
Sesame (Sesamum indicum L.) is one of the crop has been cultivated for the longest time by human and now wildly cultivated in tropical and subtropical zones (Bedigian 2003). It is a diploid species with 13 pairs of chromosomes (2n = 2x = 26) (Bremer et al. 2009), which belongs to the Sesamum genus of the family Pedaliaceae and the order Lamiales. Sesame is called “queen of the oil seeds” because of the highest oil content (~55% in the seeds) among oilseed crops, including peanut, soybean, and rapeseed (Anilakumar et al. 2010). Among the oilseed crops, sesame has a high oil content and a small diploid genome (~259.73 Mb) (Wang et al. 2014; Wang et al. 2016), making it an attractive species for genetic studies. Sesame is known to be the core lineage of Sesamum in Lamiales using phylogenetic analysis with chloroplast genome (Yi and Kim 2012). It was estimated to have diverged from the Lamiales species Utricularia gibba ~98 million years ago, and from the tomato potato lineage ~125 MYA (Wang et al. 2014).
Orthologs are homologous sequences that started to diverge through a speciation event (as with paralogs and duplication events). Consequently, the better we approximate the evolution of such sequences, the better ortholog predictions will be. In general, phylogenetic history can be parsed by examining nucleotide sequences, instead of amino acid sequences, and then separating non-synonymous substitutions (resulting in amino acid substitutions) from synonymous substitutions (silent or neutral substitutions). Selection pressure can be estimated by scaling the ratio of non-synonymous substitutions to synonymous substitutions to that expected with neutral divergence. This calculation is normalized as the number of non-synonymous substitutions per potential non-synonymous sites (dN) divided by the number of synonymous substitutions per potential synonymous sites (dS), referred to as dN/dS or Ka/Ks (Hurst 2002). Extant proteins have been selected for optimal function over millions of years, the dN value of their genes becomes depressed; thus, dN/dS < 1 indicates “purifying” selection. When dN/dS = 1, the gene may be experiencing neutral selection. While dN/dS > 1, the gene may be undergoing “positive selection” resulting adaptive evolution, where amino acid replacements are advantageous and synonymous substitutions become stabilized at a high rate (McPartland et al. 2007; Dalloul et al. 2010). Positive selection of Embryophytes plants occurred in a wide variety of gene families with different functions, including in the self in-compatibility locus, in defense against pathogens, in embryogenesis, in cold acclimation, and in electron-transport (Roth and Liberles 2006). In Brassica juncea, 13 GSL-metabolism-related genes and 22 lipid-metabolism-related genes were differentially selected between the vegetable- and oil-use subvarieties of B. juncea, which indicating that positive selection of some genes could be a major impact on plant domestication (Yang et al. 2016). An EID1 gene, responsible for a phase delay, has been under positive selection during tomato domestication or improvement (Muller et al. 2016), revealing the key position of positively selected genes during the domestication.
In this study, we developed a detailed strategy (Supplementary Fig. S1) using an evolutionary-based method to mine orthologs between different species from a genome resequencing project. Using this method, we identified orthologs in sesame, including five sesame accessions from a core set (RDA-Genebank, Republic of Korea) and several other plant species. Complemented with a branch model-based maximum likelihood estimation of dN/dS (Kim et al. 2013; Ahn et al. 2014) of the orthologs in these several species that were distributed in equilibrium phylogenetically, we identified evolutionarily accelerated genes in sesame. The methods in this study provide a case study for identifying orthologous genes in sesame and other plant species.
Plant material and whole genome resequencing
Five sesame accessions (Table 1) from a core set stored in National Genebank of the Rural Development Administration, Republic of Korea (RDA-Genebank) were selected for current study (Park et al. 2014). We conducted a field experiment during the 2015 sesame growing season at Kongju National University experimental farm. Young leaves from a single plant were sampled and stored at −80°C prior to genomic DNA extraction using the DNeasy plant mini kit (Qiagen, Germany). At least 5 μg of genomic DNA was used for each accession to construct paired-end sequencing libraries with insert sizes of approximately 500 bp according to the manufacturer’s instructions (Illumina Inc., USA). We sequenced the genomes of the collected sesame varieties in deep coverage (~19.8×) on an Illumina HiSeq 2500 Sequencing System Platform (Illumina Inc., USA) with paired-end 2 × 100 bp read length.
Genotype calling and CDS extraction
Raw sequences were first trimmed using the SICKLE tool (https://github.com/najoshi/sickle) to remove low-quality reads. Then, high-quality reads were aligned against the sesame reference genome (ver. 2.0) (Wang et al. 2016) using the Burrows-Wheeler Aligner (BWA; ver. 0.7.5a) software (Li and Durbin 2009). Reads that did not meet BWA quality criteria or did not match the reference genome were removed. PCR duplicate reads were also removed using PICARD tool (ver. 1.88; http://broadinstitute.github.io/picard/). Applied with the Genome Analysis Toolkit (GATK; ver. 2.3.9 Lite) (McKenna et al. 2010), regional realignment and quality score recalibration were first carried out and then single nucleotide polymorphisms (SNPs) and insertion/deletion polymorphisms (indels) were called with ≥ 3× of read depth coverage. Coding sequences (CDSs) of the five sesame core accessions were extracted, based on the annotations of the sesame reference genome using the variant call format (VCF) file generated from genotype calling of the five accessions.
Ortholog identification in sesame
A branch model-based phylogenetic tree was constructed with the TimeTree database (Hedges et al. 2015), and several Pentapetalae plant species (Solanum tuberosum, Solanum lycopersicum, Populus trichocarpa, Brassica rapa, Arabidopsis thaliana, and Vitis vinifera) that distributed in equilibrium phylogenetically with sesame were selected for comparison with S. indicum. The coding sequences of Solanum tuberosum were downloaded from NCBI (http://www.ncbi.nlm.nih.gov) and the other six Pentapetalae species were downloaded from Ensembl Plants (release 30; http://plants.ensembl.org). Ortholog information for the other Pentapetalae plants was also downloaded from BioMart of Ensembl Plants, satisfying orthologs confidence = 1. Since orthologous relationship between S. tuberosum and other Pentapetalae species had not been established, we performed a reciprocal best hits test between S. indicum and S. tuberosum, the closest species to S. indicum among six Pentapetalae species, using the options, E-value < 10 −5, 50 < coverage (%) < 150, and identity (%) > 60 (Brudno et al. 2003; Schwede et al. 2003; Katoh et al. 2005).
Branch model-based evolutionarily accelerated gene mining
To identify rapidly evolving genes in sesame, we conducted a branch model-based maximum likelihood estimation of dN/dS values among seven Pentapetalae species, including S. indicum. The coding sequences of all the seven Pentapetalae species and the five accessions of S. indicum were aligned using PRANK (ver. 150803) software (Löytynoja and Goldman 2005), which is known for accurately align insertion and deletion (Fletcher and Yang 2010). Poor alignments were eliminated using Gblocks (ver. 0.91b) (Castresana 2000). Selected positive genes for S. indicum and each of the five accessions were tested using a maximum likelihood estimation of dN/dS ratio with the phylogenetic tree (Fig. 1) (Yang 2007). Maximum likelihood estimations of dN/dS ratios were determined with branch-site model A with codon frequency F3x4 using PAML (ver. 4.8a) software (Yang 2007). The log-likelihood ratio test was used for determining P-values, and multiple testing errors were controlled using the false discovery rate (FDR).
GO enrichment, pathway, and network analysis
All the identified evolutionarily accelerated genes were subjected to Gene Ontology (GO) enrichment, pathway, and network analyses. GO enrichment analyses were performed using AgriGO (http://bioinfo.cau.edu.cn/agriGO/) with Arabidopsis thaliana as the background species (Du et al. 2010). The P-values for the considered enrichment GO terms were < 0.05 and the minimum number of mapping entries was set to 5. Plant metabolic pathway enrichment analysis was carried out online using the Plant Gene Set Enrichment Analysis Toolkit (GSEA, http://structuralbiology.cau.edu.cn/PlantGSEA/analysis.php), based on the corresponding genes in A. thaliana (Yi et al. 2013). Network analyses were also conducted for the genes by integrating with the Biological General Repository for Interaction Datasets (BioGRID, http://thebiogrid.org/) (Chatr-Aryamontri et al. 2015), which can provide genetic and protein interaction data from model organisms. Sesame orthologs were mapped onto interactome data using cross-reference tables assembled from identifiers of A. thaliana and were displayed with Cytoscape software (Shannon et al. 2003).
Whole-genome resequencing and genotype calling
We conducted resequencing of five core set accessions in this study and generated a total of 297.8 million reads (~42 Gb) (Table 1). After removing low-quality reads, ~38 million clean reads were retained as high-quality data and were used for further analysis. About 80% of the clean reads of each accession were mapped to the sesame reference genome, resulting in an average genome coverage of ~19.8 (Table 1). We performed genotype calling of the five accessions and, in total, ~1.96 million SNP/indels were identified (Supplementary Table S1). Of them, ~86% (~1.69 million) were high-quality variations (sites without missing and heterozygotes). This suggested a large amount of high-quality data, resulting reliable results in the rest of the study. By complementing with the reference genome of sesame, and vice versa, we could see more genetic variations across the sesame genome, which would promote genetic and genomic studies of sesame to be increasingly active. For example, in this study, with the high-depth resequencing and genotype calling data, ~1.69 million high-quality variations were generated across the genome. However, an uneven distribution of these SNPs/indels was observed across some chromosomes (Supplementary Fig. S2). In contrast, several chromosomes with shorter lengths (chromosomes 1, 4, 5 and 13) had more SNPs/indels than the chromosomes with longer lengths (chromosomes 3, 6, 8, 9, and 10). This suggested that some genes in the chromosomes with high-density variations might play more important roles in the development and evolution of sesame.
Ortholog assignments
Based on the annotation information of the sesame reference genome, we identified 26,379 CDSs in each of the five accessions among a total of 27,148 protein-encoding genes in the sesame reference genome. These CDSs were mined by mapping to the sesame reference genome with the positions of each gene CDS. In this way, all the SNPs/indels could be included in the CDS specific to each accession if some SNPs/indels existed specifically in one accession. To analyze genes of S. indicum by comparing orthologs with other species in a monophyletic group, the six Pentapetalae species were carefully selected to be distributed in equilibrium phylogenetically with S. indicum.
However, the orthologs among S. indicum and the other six Pentapetalae species have not been identified, whereas the orthologs between the other six Pentapetale species were identified. Thus, we made entire orthologs including S. indicum and the other six Pentapetalae species by identifying orthologs between S. indicum and its closest neighbor among the other six Pentapetalae species, S. tuberosum. The orthologs sets between S. indicum and S. tuberosum were identified by reciprocal best hits (RBH). Candidate orthologs of sesame were then intersected with the orthologs of the other six species, downloaded from BioMart of Ensembl Plants. As a result, 8,281 pair wise orthologs sets were identified between S. indicum and S. tuberosum. Finally, 639 orthologs sets remained after extracting one-to-one (1:1) ortholog sets across all seven Pentapetalae species (Supplementary Table S2). In present study, only 1:1 orthologs were used for further dN/dS tests. One-to-many, many-to-many, and many-to-one orthologs were excluded to eliminate influences from genes that were duplicated one or more times in an ancestor of the current or other species after a speciation event. The 1:1 ortholog relationship is used in many comparative genomics studies, especially in the identification of functional orthologs (Bandyopadhyay et al. 2006); in evolutionary, comparative, and systematic studies in plants (Wu et al. 2006); and the mapping of biological pathways (Mao et al. 2006). We also used 1:1 orthologs which are suitable for comparative method based on proportion of functional changes (Sabeti et al. 2006).
Clearly, it would be easier to identify orthologs between two species if the duplication history of the genes in the two genomes was known (relative to their speciation event). Unfortunately, this evolutionary process is unknown in many cases. Thus, to find the most probable 1:1 orthologs assignment between two genomes should to be carried out based on sequence similarity. When predefined orthologs sets of a species group exist without target species, the strategy that identify pairwise orthologs between target species and its closest species in members of the predefined orthologs sets and construct whole orthologs sets had been used in genome project (Dalloul et al. 2010).
dN/dS test to identify specific, evolutionarily accelerated genes
Given the ortholog sets detected, we conducted a branch model-based maximum likelihood estimation of dN/dS in the seven species (total 12 accessions). Considering the insertion and deletion, PRANK was used to align the CDSs of all seven species. After removing poor alignments, 623 of 639 alignments remained. Under a phylogenetic model tree (Fig. 1), which indicated the tested branches for adapted genes (overall branch of S. indicum and each branch of each accession), we investigated the maximum likelihood estimation of dN/dS ratios with branch-site model A with codon frequency F3x4. In total, 198 evolutionarily accelerated orthologs were identified in S. indicum, including the five accessions, using a maximum likelihood estimation of the dN/dS ratio (Supplementary Table S3) with a value greater than 1. The difference in the maximum likelihood ratio was calculated as: LR = 2 (ln ML (dN/dS = 1)/ln ML (dN/dS)). LR was then compared against the χ2 distribution with one degree of freedom to test whether the dN/dS ratio was significantly > 1. The significance level was set at 0.05. Genes that passed this test are reported as positively selected. After adjusting the threshold with a P-value and FDR < 0.05, 66 positively selected genes were retained, and were used in further analyses.
Based on the sesame reference genome annotation, we mapped the 198 orthologs and 66 positively selected genes to the chromosomes of the sesame genome (Fig. 2). The accelerated orthologs showed a relatively even distribution across the chromosomes, with 15, 11, 19, 21, 12, 23, 11, 17, 20, 10, 13, 21, and 5 orthologs assigned to chromosomes 1 to 13, respectively. Except chromosome 13, on which only positively selected genes were located, the other chromosomes contained both positively selected orthologs and other accelerated orthologs. The distribution of these accelerated orthologs across the genome suggested that the branch model method we used here was reliable and these genes may play important roles in the development and domestication of sesame.
Enrichment and network analysis of accelerated genes
We performed a functional GO enrichment test for the accelerated genes and found 126 GO terms in three categories (Biological Process, Molecular Function, and Cellular Component) with 19 have P-values < 0.05. In these terms, the cytoplasmic part and cytoplasm in Cellular Components have highest numbers of genes (37 and 41, respectively), while the response to oxidative stress and DNA metabolic processes in Biological Processes only covered five genes (Fig. 3A). However, for the Molecular Function category, no significant GO term was found (Supplementary Table S4). Using the Plant Gene Set Enrichment Analysis web server, the genes were enriched to curated gene sets (including PlantCyc gene sets, Kyoto Encyclopedia of Genes and Genomes (KEGG) gene sets, Plant Ontology gene sets, and References collected gene sets). Except for six genes (SIN_1022276, SIN_1017317, SIN_1002603, SIN_1002195, SIN_1015149, and SIN_1004833) that did not map some gene sets, most of the other (~50) contributed to the Plant Ontology gene sets (Supplementary Fig. S3). The genes were related to the development of plant sperm cells, cytoplasmic parts, stems, carpels, leaves, roots, and seeds, indicating the important roles of these accelerated genes in sesame or other plants.
We then constructed a protein interaction network using the 66 positively selected genes as the nodes and protein-protein interaction data from the BioGRID database (http://thebiogrid.org/download.php) as the edges. The network contained 139 nodes and 400 edges, as visualized using Cytoscape with A. thaliana as the reference. Colors from green to red indicating dN/dS values from low to high in the network and the 19 positively selected genes with dN/dS values > 1 were indicated in Fig. 4. Detail information and annotation for these 19 genes are listed in Table 2. Many putative functions were characterized among these genes, including zinc ion binding, response to stress, membrane related, transferase activity, cytokinin-activated signaling pathway, fatty acid biosynthetic process, oxidation-reduction process, and abscisic acid-activated signaling pathway. It indicated that these positively selected genes could play important roles in the domestication of sesame for its specific characteristics like drought tolerance and high oil content.
In addition, we also mapped these genes to KEGG pathways (http://www.genome.jp/kegg/pathway.html). Apart from some regular pathways, we found one gene (SIN_1024502) that mapped to Arabidopsis orthologs in fatty acid metabolism and fatty acid elongation, which was also highlighted in the network analysis, and may contribute to the fatty acid biosynthetic process and oxidation-reduction process in sesame. The gene SIN_1022581(AT3G15150, AtMMS21), which encodes E3 SUMO-protein ligase, may regulates drought tolerance in Arabidopsis thaliana (Zhang et al. 2013) was also highlighted here. Since sesame usually has a relative high level of drought resistance, this gene could be contributed to drought in sesame and positively selected in the domestication. Another gene SIN_1026256, which is a ortholog of Arabidopsis SLAC1 that associated with stomatal closure, was also likely responsible for the difference of drought tolerance (Imai et al. 2015). The gene SIN_1002052 (AT5G08650), which is the homolog of translation factor GUF1, is critical for chloroplast protein synthesis under suboptimal conditions in Arabidopsis thaliana (Ji et al. 2012). Overall, these results suggested that the genes identified may play important roles in the development and domestication of sesame.
In this study, we used a branch model method to identify evolutionarily accelerated orthologs in sesame based on deep resequencing. In case of a shortage of gene annotation in sesame, we first assigned the 1:1 orthologs between sesame and six other Pentapetalae species that were distributed in equilibrium phylogenetically with S. indicum. In total, 639 1:1 orthologs were characterized, and 198 accelerated orthologs were identified after a branch model-based maximum likelihood estimation of dN/dS. Among these accelerated orthologs, 66 genes were positively selected under a significance level test. Functional enrichment of the positively selected genes using several methods suggested that these genes may play important roles in plant development during the sesame domestication. To our knowledge, this case study provides the example for orthologs identification in species that lack gene annotations.
This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (No. NRF-2014R1A1A2057073).
This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (No. NRF-2014R1A1A2057073).
Fig. 1
Branch model phylogenetic tree of 7 Pentapetalae plants and 5 Sesamum indicum accessions for the evolution analysis. Numbers in the brackets indicating the tested branches for adapted genes, (1) is for Sesamum indicum and (2)–(7) are for each core accessions. The grey color indicating the species tree.
pbb-7-024f1.jpg
Fig. 2
The location and distribution of all the accelerated orthologs and positively selected genes across the chromosomes of sesame. Each red or blue vertical bar represented one orthologs, and the red indicated the positively selected ones, and the blue indicated the other orthologs. LG01-LG13 represents the chromosomes, and the bold horizontal line indicates the length of each chromosome.
pbb-7-024f2.jpg
Fig. 3
Functional enrichment analysis of positively selected orthologs. Yellow bar is accelerated selected genes, blue bar is the Arabidopsis thaliana reference genome genes. GO enrichment analyses were performed using AgriGO (http://bioinfo.cau.edu.cn/agriGO/) with Arabidopsis thaliana as the background species.
pbb-7-024f3.jpg
Fig. 4
Cytoscape protein-protein interaction network of the positively selected genes using Arabidopsis thaliana as reference (data from BioGRID). The color from green to red indicating the dN/dS value ranging from low to high. Total 19 positively selected genes that with dN/dS value more than 1 were marked and displayed in Table 2.
pbb-7-024f4.jpg
Table 1
Summary of the accessions and resequencing status based on the sesame reference genome.
Table 1
Accessions KNUS124 KNUS158 KNUS160 KNUS161 KNUS162
Name Heukseon SADOVO 1 Nevada Ansan Milsung
Ecotypes S. indicum L. S. indicum L. S. indicum L. S. indicum L. S. indicum L.
Origin Korea Korea Korea Korea Korea
Flower color Light red Light red Pink White White
Total raw reads 51,928,276 68,070,228 60,951,160 53,119,468 63,775,198
Clean reads 42,036,751 53,923,699 49,859,367 42,786,007 50,639,996
Mapped reads 33,967,713 42,565,414 40,064,706 34,053,742 40,022,382
Mapping depth 17.297 22.129 20.875 17.833 20.82
Mapping rate (%) 80.8 78.94 80.36 79.59 79.03
Table 2
Positively selected orthologs that were associated in the network analysis.
Table 2
Sesame A. thaliana P-value FDR dN/dS Annotation
SIN_1016274 AT4G15545 0.010602 0.035579 7.05644 Transmembrane transporter activity; integral component of membrane
SIN_1013043 AT3G11460 0.00329 0.017394 4.79731 Zinc ion binding; mitochondrion; cytidine to uridine editing
SIN_1022318 AT2G18760 0.005225 0.022633 13.07842 DNA or RNA binding; response to stress; cellular processes
SIN_1006130 AT2G24220 3.49E-11 3.45E-09 5.48865 Purine nucleobase transport; integral component of membrane
SIN_1014290 AT1G55740 1.53E-28 3.03E-26 1.12951 Carbohydrate metabolic processes; cytoplasm; hydrolase activity
SIN_1023518 AT4G35910 0.000967 0.008707 6.29376 Transferase activity; lateral root formation; DNA or RNA binding; protein urmylation
SIN_1015526 AT1G63660 0.008183 0.030544 3.61322 GMP synthase (glutamine-hydrolyzing) activity; ATP binding
SIN_1024931 AT1G02160 0.000668 0.007786 999 Unknown molecular functions
SIN_1022581 AT3G15150 1.62E-07 0.000008 182.0115 Cytokinin-activated signaling pathway; SUMO transferase activity; ligase activity
SIN_1025029 AT2G44360 0.000897 0.008459 65.66277 Unknown molecular functions
SIN_1013015 AT2G36305 0.000834 0.008256 464.5285 CAAX-box protein maturation; hydrolase activity
SIN_1023551 AT3G13770 0.002761 0.017321 3.71444 Mitochondrion; zinc ion binding
SIN_1024502 AT3G45770 0.003338 0.017394 8.71415 Fatty acid biosynthetic process; oxidation-reduction process
SIN_1022837 AT2G02410 0.013785 0.042647 998.9973 Unknown molecular functions
SIN_1009102 AT1G09850 0.002858 0.017321 11.21159 Extracellular; hydrolase activity
SIN_1026256 AT1G12480 0.001418 0.010802 9.73025 Abscisic acid-activated signaling pathway
SIN_1002052 AT5G08650 0.004158 0.020584 159.029 GTPase activity; nucleotide binding
SIN_1022276 AT3G24660 0.011071 0.036536 19.87913 Protein serine/threonine kinase activity
SIN_1002603 AT3G55760 0.003092 0.017394 126.5731 Chloroplast stroma
  • Ahn H, Kim K-W, Kim HJ, Cho S, Kim H. 2014. Differential Evolution between Monotocous and Polytocous Species. Asian-australas. J Anim Sci. 27: 464-470.
  • Anilakumar KR, Pal A, Khanum F, Bawa AS. 2010. Nutritional, medicinal and industrial uses of sesame (Sesamum indicum L.) seeds-an overview. Agriculturae Conspectus Scientificus (ACS). 75: 159-168.
  • Bandyopadhyay S, Sharan R, Ideker T. 2006. Systematic identification of functional orthologs based on protein network comparison. Genome Res. 16: 428-435.
  • Bedigian D. 2003. Evolution of sesame revisited: domestication, diversity and prospects. Genet Resour Crop Evol. 50: 779-787.
  • Bremer B, Bremer K, Chase M, Fay M, Reveal J, Soltis D, et al. 2009. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III. Bot J Linn Soc. 161: 105-121.
  • Brudno M, Do CB, Cooper GM, Kim MF, Davydov E, et al. NISC Comparative Sequencing Program2003. LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13: 721-731.
  • Castresana J. 2000. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Molr Biol Evol. 17: 540-552.
  • Chatr-Aryamontri A, Breitkreutz B-J, Oughtred R, Boucher L, Heinicke S, Chen D, et al. 2015. The BioGRID interaction database: 2015 update. Nucleic Acids Res. 43: D470-D478.
  • Dalloul RA, Long JA, Zimin AV, Aslam L, Beal K, Blomberg LA, et al. 2010. Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis. PLoS Biol. 8: e1000475
  • Du Z, Zhou X, Ling Y, Zhang Z, Su Z. 2010. agriGO: a GO analysis toolkit for the agricultural community. Nucleic Acids Res. 38: W64-W70.
  • Fletcher W, Yang Z. 2010. The effect of insertions, deletions, and alignment errors on the branch-site test of positive selection. Mol Biol Evol. 27: 2257-2267.
  • Hedges SB, Marin J, Suleski M, Paymer M, Kumar S. 2015. Tree of life reveals clock-like speciation and diversification. Mol Biol Evol. 32: 835-845.
  • Hurst LD. 2002. The Ka/Ks ratio: diagnosing the form of sequence evolution. Trends Genet. 18: 486
  • Imai H, Noda Y, Tamaoki M. 2015. Alteration of Arabidopsis SLAC1 promoter and its association with natural variation in drought tolerance. Plant Signal Behav. 10: e989761
  • Ji DL, Lin H, Chi W, Zhang LX. 2012. CpLEPA is critical for chloroplast protein synthesis under suboptimal conditions in Arabidopsis thaliana. PLoS One. 7: e49746
  • Katoh K, Kuma K, Toh H, Miyata T. 2005. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 33: 511-518.
  • Kim KW, Burt DW, Kim H, Cho S. 2013. Identification of differentially evolved genes: an alternative approach to detection of accelerated molecular evolution from genome-wide comparative data. Evol Bioinform Online. 9: 285-299.
  • Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 25: 1754-1760.
  • Löytynoja A, Goldman N. 2005. An algorithm for progressive multiple alignment of sequences with insertions. Proc Natl Acad Sci USA. 102: 10557-10562.
  • Mao F, Su Z, Olman V, Dam P, Liu Z, Xu Y. 2006. Mapping of orthologous genes in the context of biological pathways: An application of integer programming. Proc Natl Acad Sci USA. 103: 129-134.
  • McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. 2010. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20: 1297-1303.
  • McPartland JM, Norris RW, Kilpatrick CW. 2007. Tempo and mode in the endocannaboinoid system. J Mol Evol. 65: 267-276.
  • Muller NA, Wijnen CL, Srinivasan A, Ryngajllo M, Ofner I, Lin T, et al. 2016. Domestication selected for deceleration of the circadian clock in cultivated tomato. Nat Genet. 48: 89-93.
  • Park JH, Suresh S, Cho GT, Choi NG, Baek HJ, Lee CW, et al. 2014. Assessment of molecular genetic diversity and population structure of sesame (Sesamum indicum L.) core collection accessions using simple sequence repeat markers. Plant Genet Resour. 12: 112-119.
  • Roth C, Liberles DA. 2006. A systematic search for positive selection in higher plants (Embryophytes). BMC Plant Biol. 6: 12
  • Sabeti PC, Schaffner SF, Fry B, Lohmueller J, Varilly P, Shamovsky O, et al. 2006. Positive natural selection in the human lineage. Science. 312: 1614-1620.
  • Schwede T, Kopp J, Guex N, Peitsch MC. 2003. SWISS-MODEL: An automated protein homology-modeling server. Nucleic Acids Res. 31: 3381-3385.
  • Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. 2003. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13: 2498-2504.
  • Wang L, Xia Q, Zhang Y, Zhu X, Zhu X, Li D, et al. 2016. Updated sesame genome assembly and fine mapping of plant height and seed coat color QTLs using a new high-density genetic map. BMC Genomics. 17: 31
  • Wang L, Yu S, Tong C, Zhao Y, Liu Y, Song C, et al. 2014. Genome sequencing of the high oil crop sesame provides insight into oil biosynthesis. Genome Biol. 15: R39
  • Wu F, Mueller LA, Crouzillat D, Pétiard V, Tanksley SD. 2006. Combining bioinformatics and phylogenetics to identify large sets of single-copy orthologous genes (COSII) for comparative, evolutionary and systematic studies: a test case in the euasterid plant clade. Genetics. 174: 1407-1420.
  • Yang J, Liu D, Wang X, Ji C, Cheng F, Liu B, et al. 2016. The genome sequence of allopolyploid Brassica juncea and analysis of differential homoeolog gene expression influencing selection. Nat Genet. 48: 1225-1232.
  • Yang Z. 2007. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 24: 1586-1591.
  • Yi DK, Kim KJ. 2012. Complete chloroplast genome sequences of important oilseed crop Sesamum indicum L. PLoS One. 7: e35872
  • Yi X, Du Z, Su Z. 2013. PlantGSEA: a gene set enrichment analysis toolkit for plant community. Nucleic Acids Res. 41: W98-W103.
  • Zhang S, Qi Y, Liu M, Yang C. 2013. SUMO E3 ligase AtMMS21 regulates drought tolerance in Arabidopsis thaliana. J Integr Plant Biol. 55: 83-95.

Download Citation

Download a citation file in RIS format that can be imported by all major citation management software, including EndNote, ProCite, RefWorks, and Reference Manager.

Format:

Include:

Positively Selected Orthologous Genes Identified in Sesame (Sesamum indicum) by Deep Resequencing
Plant Breed. Biotech.. 2019;7(1):24-33.   Published online March 1, 2019
Download Citation

Download a citation file in RIS format that can be imported by all major citation management software, including EndNote, ProCite, RefWorks, and Reference Manager.

Format:
Include:
Positively Selected Orthologous Genes Identified in Sesame (Sesamum indicum) by Deep Resequencing
Plant Breed. Biotech.. 2019;7(1):24-33.   Published online March 1, 2019
Close

Figure

  • 0
  • 1
  • 2
  • 3
Positively Selected Orthologous Genes Identified in Sesame (Sesamum indicum) by Deep Resequencing
Image Image Image Image
Fig. 1 Branch model phylogenetic tree of 7 Pentapetalae plants and 5 Sesamum indicum accessions for the evolution analysis. Numbers in the brackets indicating the tested branches for adapted genes, (1) is for Sesamum indicum and (2)–(7) are for each core accessions. The grey color indicating the species tree.
Fig. 2 The location and distribution of all the accelerated orthologs and positively selected genes across the chromosomes of sesame. Each red or blue vertical bar represented one orthologs, and the red indicated the positively selected ones, and the blue indicated the other orthologs. LG01-LG13 represents the chromosomes, and the bold horizontal line indicates the length of each chromosome.
Fig. 3 Functional enrichment analysis of positively selected orthologs. Yellow bar is accelerated selected genes, blue bar is the Arabidopsis thaliana reference genome genes. GO enrichment analyses were performed using AgriGO (http://bioinfo.cau.edu.cn/agriGO/) with Arabidopsis thaliana as the background species.
Fig. 4 Cytoscape protein-protein interaction network of the positively selected genes using Arabidopsis thaliana as reference (data from BioGRID). The color from green to red indicating the dN/dS value ranging from low to high. Total 19 positively selected genes that with dN/dS value more than 1 were marked and displayed in Table 2.
Positively Selected Orthologous Genes Identified in Sesame (Sesamum indicum) by Deep Resequencing

Summary of the accessions and resequencing status based on the sesame reference genome.

Accessions KNUS124 KNUS158 KNUS160 KNUS161 KNUS162
Name Heukseon SADOVO 1 Nevada Ansan Milsung
Ecotypes S. indicum L. S. indicum L. S. indicum L. S. indicum L. S. indicum L.
Origin Korea Korea Korea Korea Korea
Flower color Light red Light red Pink White White
Total raw reads 51,928,276 68,070,228 60,951,160 53,119,468 63,775,198
Clean reads 42,036,751 53,923,699 49,859,367 42,786,007 50,639,996
Mapped reads 33,967,713 42,565,414 40,064,706 34,053,742 40,022,382
Mapping depth 17.297 22.129 20.875 17.833 20.82
Mapping rate (%) 80.8 78.94 80.36 79.59 79.03

Positively selected orthologs that were associated in the network analysis.

Sesame A. thaliana P-value FDR dN/dS Annotation
SIN_1016274 AT4G15545 0.010602 0.035579 7.05644 Transmembrane transporter activity; integral component of membrane
SIN_1013043 AT3G11460 0.00329 0.017394 4.79731 Zinc ion binding; mitochondrion; cytidine to uridine editing
SIN_1022318 AT2G18760 0.005225 0.022633 13.07842 DNA or RNA binding; response to stress; cellular processes
SIN_1006130 AT2G24220 3.49E-11 3.45E-09 5.48865 Purine nucleobase transport; integral component of membrane
SIN_1014290 AT1G55740 1.53E-28 3.03E-26 1.12951 Carbohydrate metabolic processes; cytoplasm; hydrolase activity
SIN_1023518 AT4G35910 0.000967 0.008707 6.29376 Transferase activity; lateral root formation; DNA or RNA binding; protein urmylation
SIN_1015526 AT1G63660 0.008183 0.030544 3.61322 GMP synthase (glutamine-hydrolyzing) activity; ATP binding
SIN_1024931 AT1G02160 0.000668 0.007786 999 Unknown molecular functions
SIN_1022581 AT3G15150 1.62E-07 0.000008 182.0115 Cytokinin-activated signaling pathway; SUMO transferase activity; ligase activity
SIN_1025029 AT2G44360 0.000897 0.008459 65.66277 Unknown molecular functions
SIN_1013015 AT2G36305 0.000834 0.008256 464.5285 CAAX-box protein maturation; hydrolase activity
SIN_1023551 AT3G13770 0.002761 0.017321 3.71444 Mitochondrion; zinc ion binding
SIN_1024502 AT3G45770 0.003338 0.017394 8.71415 Fatty acid biosynthetic process; oxidation-reduction process
SIN_1022837 AT2G02410 0.013785 0.042647 998.9973 Unknown molecular functions
SIN_1009102 AT1G09850 0.002858 0.017321 11.21159 Extracellular; hydrolase activity
SIN_1026256 AT1G12480 0.001418 0.010802 9.73025 Abscisic acid-activated signaling pathway
SIN_1002052 AT5G08650 0.004158 0.020584 159.029 GTPase activity; nucleotide binding
SIN_1022276 AT3G24660 0.011071 0.036536 19.87913 Protein serine/threonine kinase activity
SIN_1002603 AT3G55760 0.003092 0.017394 126.5731 Chloroplast stroma
Table 1 Summary of the accessions and resequencing status based on the sesame reference genome.
Table 2 Positively selected orthologs that were associated in the network analysis.