Positively Selected Orthologous Genes Identified in Sesame (Sesamum indicum) by Deep Resequencing

Jie Yu; Myeong-Hyeon Min; Sang-Ho Chu; Kyu-Won Kim; Yong-Jin Park

doi:10.9787/PBB.2019.7.1.24

Articles

Page Path

Research Article

Positively Selected Orthologous Genes Identified in Sesame (Sesamum indicum) by Deep Resequencing

Jie Yu^1,2, Myeong-Hyeon Min¹, Sang-Ho Chu³, Kyu-Won Kim^3,*, Yong-Jin Park^1,3,*

Plant Breeding and Biotechnology 2019;7(1):24-33.
Published online: March 1, 2019

DOI: https://doi.org/10.9787/PBB.2019.7.1.24

¹Department of Plant Resources, College of Industrial Science, Kongju National University, Yesan 32439, Korea

²State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei 230036, China

³Center of Crop Breeding on Omics and Artificial Intelligence, Kongju National University, Yesan 32439, Korea

*Corresponding author: Kyu-Won Kim, sh.kyuwon@gmail.com, Tel: +82-41-330-1213, Fax: +82-41-330-1209

*Yong-Jin Park, yjpark@kongju.ac.kr, Tel: +82-41-330-1201, Fax: +82-41-330-1209

• Received: January 21, 2019 • Revised: February 20, 2019 • Accepted: February 20, 2019

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

3 Views
0 Download

prev next

Full Article

Download PDF

Abstract
INTRODUCTION
MATERIALS AND METHODS
RESULTS
DISCUSSION
Supplementary Information
ACKNOWLEDGEMENTS
ACKNOWLEDGEMENTS
References

Abstract

Sesame (Sesamum indicum L.) is the queen of oil seed crops and is cultivated widely in tropical and subtropical areas. The availability of the sesame genome sequence presents unprecedented opportunities for studying its genetics, genomics, and evolution. In this report, we conducted a genome resequencing-based identification of sesame orthologs; in total, 26,379 coding sequences (CDSs) were isolated. Using a reciprocal best hit, we ultimately identified a total of 639 orthologs sets after one-to-one orthologs extraction across seven Pentapetalae plant species. These orthologs were considered to be the most credible between the two species, and in sesame. Furthermore, we performed a branch model-based maximum likelihood estimation of dN/dS of the orthologs, resulting in the identification of 198 evolutionarily accelerated orthologs and 66 positively selected genes (P-value and FDR < 0.05). An enrichment analysis and protein interaction network suggested 19 genes with important functions of the orthologs specific in sesame development and domestication. The method we used here provides a case study for identifying orthologous genes between sesame and other plants species that are distributed in equilibrium phylogenetically, which can be used in other plants.
Key words: Sesame, Orthologs, Resequencing, dN/dS, Evolution

INTRODUCTION

Sesame (Sesamum indicum L.) is one of the crop has been cultivated for the longest time by human and now wildly cultivated in tropical and subtropical zones (Bedigian 2003). It is a diploid species with 13 pairs of chromosomes (2n = 2x = 26) (Bremer et al. 2009), which belongs to the Sesamum genus of the family Pedaliaceae and the order Lamiales. Sesame is called “queen of the oil seeds” because of the highest oil content (~55% in the seeds) among oilseed crops, including peanut, soybean, and rapeseed (Anilakumar et al. 2010). Among the oilseed crops, sesame has a high oil content and a small diploid genome (~259.73 Mb) (Wang et al. 2014; Wang et al. 2016), making it an attractive species for genetic studies. Sesame is known to be the core lineage of Sesamum in Lamiales using phylogenetic analysis with chloroplast genome (Yi and Kim 2012). It was estimated to have diverged from the Lamiales species Utricularia gibba ~98 million years ago, and from the tomato potato lineage ~125 MYA (Wang et al. 2014).

Orthologs are homologous sequences that started to diverge through a speciation event (as with paralogs and duplication events). Consequently, the better we approximate the evolution of such sequences, the better ortholog predictions will be. In general, phylogenetic history can be parsed by examining nucleotide sequences, instead of amino acid sequences, and then separating non-synonymous substitutions (resulting in amino acid substitutions) from synonymous substitutions (silent or neutral substitutions). Selection pressure can be estimated by scaling the ratio of non-synonymous substitutions to synonymous substitutions to that expected with neutral divergence. This calculation is normalized as the number of non-synonymous substitutions per potential non-synonymous sites (dN) divided by the number of synonymous substitutions per potential synonymous sites (dS), referred to as dN/dS or Ka/Ks (Hurst 2002). Extant proteins have been selected for optimal function over millions of years, the dN value of their genes becomes depressed; thus, dN/dS < 1 indicates “purifying” selection. When dN/dS = 1, the gene may be experiencing neutral selection. While dN/dS > 1, the gene may be undergoing “positive selection” resulting adaptive evolution, where amino acid replacements are advantageous and synonymous substitutions become stabilized at a high rate (McPartland et al. 2007; Dalloul et al. 2010). Positive selection of Embryophytes plants occurred in a wide variety of gene families with different functions, including in the self in-compatibility locus, in defense against pathogens, in embryogenesis, in cold acclimation, and in electron-transport (Roth and Liberles 2006). In Brassica juncea, 13 GSL-metabolism-related genes and 22 lipid-metabolism-related genes were differentially selected between the vegetable- and oil-use subvarieties of B. juncea, which indicating that positive selection of some genes could be a major impact on plant domestication (Yang et al. 2016). An EID1 gene, responsible for a phase delay, has been under positive selection during tomato domestication or improvement (Muller et al. 2016), revealing the key position of positively selected genes during the domestication.

In this study, we developed a detailed strategy (Supplementary Fig. S1) using an evolutionary-based method to mine orthologs between different species from a genome resequencing project. Using this method, we identified orthologs in sesame, including five sesame accessions from a core set (RDA-Genebank, Republic of Korea) and several other plant species. Complemented with a branch model-based maximum likelihood estimation of dN/dS (Kim et al. 2013; Ahn et al. 2014) of the orthologs in these several species that were distributed in equilibrium phylogenetically, we identified evolutionarily accelerated genes in sesame. The methods in this study provide a case study for identifying orthologous genes in sesame and other plant species.

MATERIALS AND METHODS

Plant material and whole genome resequencing

Five sesame accessions (Table 1) from a core set stored in National Genebank of the Rural Development Administration, Republic of Korea (RDA-Genebank) were selected for current study (Park et al. 2014). We conducted a field experiment during the 2015 sesame growing season at Kongju National University experimental farm. Young leaves from a single plant were sampled and stored at −80°C prior to genomic DNA extraction using the DNeasy plant mini kit (Qiagen, Germany). At least 5 μg of genomic DNA was used for each accession to construct paired-end sequencing libraries with insert sizes of approximately 500 bp according to the manufacturer’s instructions (Illumina Inc., USA). We sequenced the genomes of the collected sesame varieties in deep coverage (~19.8×) on an Illumina HiSeq 2500 Sequencing System Platform (Illumina Inc., USA) with paired-end 2 × 100 bp read length.

Genotype calling and CDS extraction

Raw sequences were first trimmed using the SICKLE tool (https://github.com/najoshi/sickle) to remove low-quality reads. Then, high-quality reads were aligned against the sesame reference genome (ver. 2.0) (Wang et al. 2016) using the Burrows-Wheeler Aligner (BWA; ver. 0.7.5a) software (Li and Durbin 2009). Reads that did not meet BWA quality criteria or did not match the reference genome were removed. PCR duplicate reads were also removed using PICARD tool (ver. 1.88; http://broadinstitute.github.io/picard/). Applied with the Genome Analysis Toolkit (GATK; ver. 2.3.9 Lite) (McKenna et al. 2010), regional realignment and quality score recalibration were first carried out and then single nucleotide polymorphisms (SNPs) and insertion/deletion polymorphisms (indels) were called with ≥ 3× of read depth coverage. Coding sequences (CDSs) of the five sesame core accessions were extracted, based on the annotations of the sesame reference genome using the variant call format (VCF) file generated from genotype calling of the five accessions.

Ortholog identification in sesame

A branch model-based phylogenetic tree was constructed with the TimeTree database (Hedges et al. 2015), and several Pentapetalae plant species (Solanum tuberosum, Solanum lycopersicum, Populus trichocarpa, Brassica rapa, Arabidopsis thaliana, and Vitis vinifera) that distributed in equilibrium phylogenetically with sesame were selected for comparison with S. indicum. The coding sequences of Solanum tuberosum were downloaded from NCBI (http://www.ncbi.nlm.nih.gov) and the other six Pentapetalae species were downloaded from Ensembl Plants (release 30; http://plants.ensembl.org). Ortholog information for the other Pentapetalae plants was also downloaded from BioMart of Ensembl Plants, satisfying orthologs confidence = 1. Since orthologous relationship between S. tuberosum and other Pentapetalae species had not been established, we performed a reciprocal best hits test between S. indicum and S. tuberosum, the closest species to S. indicum among six Pentapetalae species, using the options, E-value < 10 ⁻⁵, 50 < coverage (%) < 150, and identity (%) > 60 (Brudno et al. 2003; Schwede et al. 2003; Katoh et al. 2005).

Branch model-based evolutionarily accelerated gene mining

To identify rapidly evolving genes in sesame, we conducted a branch model-based maximum likelihood estimation of dN/dS values among seven Pentapetalae species, including S. indicum. The coding sequences of all the seven Pentapetalae species and the five accessions of S. indicum were aligned using PRANK (ver. 150803) software (Löytynoja and Goldman 2005), which is known for accurately align insertion and deletion (Fletcher and Yang 2010). Poor alignments were eliminated using Gblocks (ver. 0.91b) (Castresana 2000). Selected positive genes for S. indicum and each of the five accessions were tested using a maximum likelihood estimation of dN/dS ratio with the phylogenetic tree (Fig. 1) (Yang 2007). Maximum likelihood estimations of dN/dS ratios were determined with branch-site model A with codon frequency F3x4 using PAML (ver. 4.8a) software (Yang 2007). The log-likelihood ratio test was used for determining P-values, and multiple testing errors were controlled using the false discovery rate (FDR).

GO enrichment, pathway, and network analysis

All the identified evolutionarily accelerated genes were subjected to Gene Ontology (GO) enrichment, pathway, and network analyses. GO enrichment analyses were performed using AgriGO (http://bioinfo.cau.edu.cn/agriGO/) with Arabidopsis thaliana as the background species (Du et al. 2010). The P-values for the considered enrichment GO terms were < 0.05 and the minimum number of mapping entries was set to 5. Plant metabolic pathway enrichment analysis was carried out online using the Plant Gene Set Enrichment Analysis Toolkit (GSEA, http://structuralbiology.cau.edu.cn/PlantGSEA/analysis.php), based on the corresponding genes in A. thaliana (Yi et al. 2013). Network analyses were also conducted for the genes by integrating with the Biological General Repository for Interaction Datasets (BioGRID, http://thebiogrid.org/) (Chatr-Aryamontri et al. 2015), which can provide genetic and protein interaction data from model organisms. Sesame orthologs were mapped onto interactome data using cross-reference tables assembled from identifiers of A. thaliana and were displayed with Cytoscape software (Shannon et al. 2003).

RESULTS

Whole-genome resequencing and genotype calling

We conducted resequencing of five core set accessions in this study and generated a total of 297.8 million reads (~42 Gb) (Table 1). After removing low-quality reads, ~38 million clean reads were retained as high-quality data and were used for further analysis. About 80% of the clean reads of each accession were mapped to the sesame reference genome, resulting in an average genome coverage of ~19.8 (Table 1). We performed genotype calling of the five accessions and, in total, ~1.96 million SNP/indels were identified (Supplementary Table S1). Of them, ~86% (~1.69 million) were high-quality variations (sites without missing and heterozygotes). This suggested a large amount of high-quality data, resulting reliable results in the rest of the study. By complementing with the reference genome of sesame, and vice versa, we could see more genetic variations across the sesame genome, which would promote genetic and genomic studies of sesame to be increasingly active. For example, in this study, with the high-depth resequencing and genotype calling data, ~1.69 million high-quality variations were generated across the genome. However, an uneven distribution of these SNPs/indels was observed across some chromosomes (Supplementary Fig. S2). In contrast, several chromosomes with shorter lengths (chromosomes 1, 4, 5 and 13) had more SNPs/indels than the chromosomes with longer lengths (chromosomes 3, 6, 8, 9, and 10). This suggested that some genes in the chromosomes with high-density variations might play more important roles in the development and evolution of sesame.

Ortholog assignments

Based on the annotation information of the sesame reference genome, we identified 26,379 CDSs in each of the five accessions among a total of 27,148 protein-encoding genes in the sesame reference genome. These CDSs were mined by mapping to the sesame reference genome with the positions of each gene CDS. In this way, all the SNPs/indels could be included in the CDS specific to each accession if some SNPs/indels existed specifically in one accession. To analyze genes of S. indicum by comparing orthologs with other species in a monophyletic group, the six Pentapetalae species were carefully selected to be distributed in equilibrium phylogenetically with S. indicum.

However, the orthologs among S. indicum and the other six Pentapetalae species have not been identified, whereas the orthologs between the other six Pentapetale species were identified. Thus, we made entire orthologs including S. indicum and the other six Pentapetalae species by identifying orthologs between S. indicum and its closest neighbor among the other six Pentapetalae species, S. tuberosum. The orthologs sets between S. indicum and S. tuberosum were identified by reciprocal best hits (RBH). Candidate orthologs of sesame were then intersected with the orthologs of the other six species, downloaded from BioMart of Ensembl Plants. As a result, 8,281 pair wise orthologs sets were identified between S. indicum and S. tuberosum. Finally, 639 orthologs sets remained after extracting one-to-one (1:1) ortholog sets across all seven Pentapetalae species (Supplementary Table S2). In present study, only 1:1 orthologs were used for further dN/dS tests. One-to-many, many-to-many, and many-to-one orthologs were excluded to eliminate influences from genes that were duplicated one or more times in an ancestor of the current or other species after a speciation event. The 1:1 ortholog relationship is used in many comparative genomics studies, especially in the identification of functional orthologs (Bandyopadhyay et al. 2006); in evolutionary, comparative, and systematic studies in plants (Wu et al. 2006); and the mapping of biological pathways (Mao et al. 2006). We also used 1:1 orthologs which are suitable for comparative method based on proportion of functional changes (Sabeti et al. 2006).

Clearly, it would be easier to identify orthologs between two species if the duplication history of the genes in the two genomes was known (relative to their speciation event). Unfortunately, this evolutionary process is unknown in many cases. Thus, to find the most probable 1:1 orthologs assignment between two genomes should to be carried out based on sequence similarity. When predefined orthologs sets of a species group exist without target species, the strategy that identify pairwise orthologs between target species and its closest species in members of the predefined orthologs sets and construct whole orthologs sets had been used in genome project (Dalloul et al. 2010).

dN/dS test to identify specific, evolutionarily accelerated genes

Given the ortholog sets detected, we conducted a branch model-based maximum likelihood estimation of dN/dS in the seven species (total 12 accessions). Considering the insertion and deletion, PRANK was used to align the CDSs of all seven species. After removing poor alignments, 623 of 639 alignments remained. Under a phylogenetic model tree (Fig. 1), which indicated the tested branches for adapted genes (overall branch of S. indicum and each branch of each accession), we investigated the maximum likelihood estimation of dN/dS ratios with branch-site model A with codon frequency F3x4. In total, 198 evolutionarily accelerated orthologs were identified in S. indicum, including the five accessions, using a maximum likelihood estimation of the dN/dS ratio (Supplementary Table S3) with a value greater than 1. The difference in the maximum likelihood ratio was calculated as: LR = 2 (ln ML (dN/dS = 1)/ln ML (dN/dS)). LR was then compared against the χ² distribution with one degree of freedom to test whether the dN/dS ratio was significantly > 1. The significance level was set at 0.05. Genes that passed this test are reported as positively selected. After adjusting the threshold with a P-value and FDR < 0.05, 66 positively selected genes were retained, and were used in further analyses.

Based on the sesame reference genome annotation, we mapped the 198 orthologs and 66 positively selected genes to the chromosomes of the sesame genome (Fig. 2). The accelerated orthologs showed a relatively even distribution across the chromosomes, with 15, 11, 19, 21, 12, 23, 11, 17, 20, 10, 13, 21, and 5 orthologs assigned to chromosomes 1 to 13, respectively. Except chromosome 13, on which only positively selected genes were located, the other chromosomes contained both positively selected orthologs and other accelerated orthologs. The distribution of these accelerated orthologs across the genome suggested that the branch model method we used here was reliable and these genes may play important roles in the development and domestication of sesame.

Enrichment and network analysis of accelerated genes

We performed a functional GO enrichment test for the accelerated genes and found 126 GO terms in three categories (Biological Process, Molecular Function, and Cellular Component) with 19 have P-values < 0.05. In these terms, the cytoplasmic part and cytoplasm in Cellular Components have highest numbers of genes (37 and 41, respectively), while the response to oxidative stress and DNA metabolic processes in Biological Processes only covered five genes (Fig. 3A). However, for the Molecular Function category, no significant GO term was found (Supplementary Table S4). Using the Plant Gene Set Enrichment Analysis web server, the genes were enriched to curated gene sets (including PlantCyc gene sets, Kyoto Encyclopedia of Genes and Genomes (KEGG) gene sets, Plant Ontology gene sets, and References collected gene sets). Except for six genes (SIN_1022276, SIN_1017317, SIN_1002603, SIN_1002195, SIN_1015149, and SIN_1004833) that did not map some gene sets, most of the other (~50) contributed to the Plant Ontology gene sets (Supplementary Fig. S3). The genes were related to the development of plant sperm cells, cytoplasmic parts, stems, carpels, leaves, roots, and seeds, indicating the important roles of these accelerated genes in sesame or other plants.

We then constructed a protein interaction network using the 66 positively selected genes as the nodes and protein-protein interaction data from the BioGRID database (http://thebiogrid.org/download.php) as the edges. The network contained 139 nodes and 400 edges, as visualized using Cytoscape with A. thaliana as the reference. Colors from green to red indicating dN/dS values from low to high in the network and the 19 positively selected genes with dN/dS values > 1 were indicated in Fig. 4. Detail information and annotation for these 19 genes are listed in Table 2. Many putative functions were characterized among these genes, including zinc ion binding, response to stress, membrane related, transferase activity, cytokinin-activated signaling pathway, fatty acid biosynthetic process, oxidation-reduction process, and abscisic acid-activated signaling pathway. It indicated that these positively selected genes could play important roles in the domestication of sesame for its specific characteristics like drought tolerance and high oil content.

In addition, we also mapped these genes to KEGG pathways (http://www.genome.jp/kegg/pathway.html). Apart from some regular pathways, we found one gene (SIN_1024502) that mapped to Arabidopsis orthologs in fatty acid metabolism and fatty acid elongation, which was also highlighted in the network analysis, and may contribute to the fatty acid biosynthetic process and oxidation-reduction process in sesame. The gene SIN_1022581(AT3G15150, AtMMS21), which encodes E3 SUMO-protein ligase, may regulates drought tolerance in Arabidopsis thaliana (Zhang et al. 2013) was also highlighted here. Since sesame usually has a relative high level of drought resistance, this gene could be contributed to drought in sesame and positively selected in the domestication. Another gene SIN_1026256, which is a ortholog of Arabidopsis SLAC1 that associated with stomatal closure, was also likely responsible for the difference of drought tolerance (Imai et al. 2015). The gene SIN_1002052 (AT5G08650), which is the homolog of translation factor GUF1, is critical for chloroplast protein synthesis under suboptimal conditions in Arabidopsis thaliana (Ji et al. 2012). Overall, these results suggested that the genes identified may play important roles in the development and domestication of sesame.

DISCUSSION

In this study, we used a branch model method to identify evolutionarily accelerated orthologs in sesame based on deep resequencing. In case of a shortage of gene annotation in sesame, we first assigned the 1:1 orthologs between sesame and six other Pentapetalae species that were distributed in equilibrium phylogenetically with S. indicum. In total, 639 1:1 orthologs were characterized, and 198 accelerated orthologs were identified after a branch model-based maximum likelihood estimation of dN/dS. Among these accelerated orthologs, 66 genes were positively selected under a significance level test. Functional enrichment of the positively selected genes using several methods suggested that these genes may play important roles in plant development during the sesame domestication. To our knowledge, this case study provides the example for orthologs identification in species that lack gene annotations.

Supplementary Information

PBB-7-024_suppl.1.xlsx

PBB-7-024_suppl.2.doc

ACKNOWLEDGEMENTS

This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (No. NRF-2014R1A1A2057073).

ACKNOWLEDGEMENTS

This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (No. NRF-2014R1A1A2057073).

Fig. 1

Branch model phylogenetic tree of 7 Pentapetalae plants and 5 Sesamum indicum accessions for the evolution analysis. Numbers in the brackets indicating the tested branches for adapted genes, (1) is for Sesamum indicum and (2)–(7) are for each core accessions. The grey color indicating the species tree.

Fig. 2

The location and distribution of all the accelerated orthologs and positively selected genes across the chromosomes of sesame. Each red or blue vertical bar represented one orthologs, and the red indicated the positively selected ones, and the blue indicated the other orthologs. LG01-LG13 represents the chromosomes, and the bold horizontal line indicates the length of each chromosome.

Fig. 3

Functional enrichment analysis of positively selected orthologs. Yellow bar is accelerated selected genes, blue bar is the Arabidopsis thaliana reference genome genes. GO enrichment analyses were performed using AgriGO (http://bioinfo.cau.edu.cn/agriGO/) with Arabidopsis thaliana as the background species.

Fig. 4

Cytoscape protein-protein interaction network of the positively selected genes using Arabidopsis thaliana as reference (data from BioGRID). The color from green to red indicating the dN/dS value ranging from low to high. Total 19 positively selected genes that with dN/dS value more than 1 were marked and displayed in Table 2.

Table 1

Summary of the accessions and resequencing status based on the sesame reference genome.

Table 1
Accessions	KNUS124	KNUS158	KNUS160	KNUS161	KNUS162
Name	Heukseon	SADOVO 1	Nevada	Ansan	Milsung
Ecotypes	S. indicum L.	S. indicum L.	S. indicum L.	S. indicum L.	S. indicum L.
Origin	Korea	Korea	Korea	Korea	Korea
Flower color	Light red	Light red	Pink	White	White
Total raw reads	51,928,276	68,070,228	60,951,160	53,119,468	63,775,198
Clean reads	42,036,751	53,923,699	49,859,367	42,786,007	50,639,996
Mapped reads	33,967,713	42,565,414	40,064,706	34,053,742	40,022,382
Mapping depth	17.297	22.129	20.875	17.833	20.82
Mapping rate (%)	80.8	78.94	80.36	79.59	79.03

Table 2

Positively selected orthologs that were associated in the network analysis.

Table 2
Sesame	A. thaliana	P-value	FDR	dN/dS	Annotation
SIN_1016274	AT4G15545	0.010602	0.035579	7.05644	Transmembrane transporter activity; integral component of membrane
SIN_1013043	AT3G11460	0.00329	0.017394	4.79731	Zinc ion binding; mitochondrion; cytidine to uridine editing
SIN_1022318	AT2G18760	0.005225	0.022633	13.07842	DNA or RNA binding; response to stress; cellular processes
SIN_1006130	AT2G24220	3.49E-11	3.45E-09	5.48865	Purine nucleobase transport; integral component of membrane
SIN_1014290	AT1G55740	1.53E-28	3.03E-26	1.12951	Carbohydrate metabolic processes; cytoplasm; hydrolase activity
SIN_1023518	AT4G35910	0.000967	0.008707	6.29376	Transferase activity; lateral root formation; DNA or RNA binding; protein urmylation
SIN_1015526	AT1G63660	0.008183	0.030544	3.61322	GMP synthase (glutamine-hydrolyzing) activity; ATP binding
SIN_1024931	AT1G02160	0.000668	0.007786	999	Unknown molecular functions
SIN_1022581	AT3G15150	1.62E-07	0.000008	182.0115	Cytokinin-activated signaling pathway; SUMO transferase activity; ligase activity
SIN_1025029	AT2G44360	0.000897	0.008459	65.66277	Unknown molecular functions
SIN_1013015	AT2G36305	0.000834	0.008256	464.5285	CAAX-box protein maturation; hydrolase activity
SIN_1023551	AT3G13770	0.002761	0.017321	3.71444	Mitochondrion; zinc ion binding
SIN_1024502	AT3G45770	0.003338	0.017394	8.71415	Fatty acid biosynthetic process; oxidation-reduction process
SIN_1022837	AT2G02410	0.013785	0.042647	998.9973	Unknown molecular functions
SIN_1009102	AT1G09850	0.002858	0.017321	11.21159	Extracellular; hydrolase activity
SIN_1026256	AT1G12480	0.001418	0.010802	9.73025	Abscisic acid-activated signaling pathway
SIN_1002052	AT5G08650	0.004158	0.020584	159.029	GTPase activity; nucleotide binding
SIN_1022276	AT3G24660	0.011071	0.036536	19.87913	Protein serine/threonine kinase activity
SIN_1002603	AT3G55760	0.003092	0.017394	126.5731	Chloroplast stroma

References

Ahn H, Kim K-W, Kim HJ, Cho S, Kim H. 2014. Differential Evolution between Monotocous and Polytocous Species. Asian-australas. J Anim Sci. 27: 464-470.
Anilakumar KR, Pal A, Khanum F, Bawa AS. 2010. Nutritional, medicinal and industrial uses of sesame (Sesamum indicum L.) seeds-an overview. Agriculturae Conspectus Scientificus (ACS). 75: 159-168.
Bandyopadhyay S, Sharan R, Ideker T. 2006. Systematic identification of functional orthologs based on protein network comparison. Genome Res. 16: 428-435.
Article
PubMed
PMC
Bedigian D. 2003. Evolution of sesame revisited: domestication, diversity and prospects. Genet Resour Crop Evol. 50: 779-787.
Article
PDF
Bremer B, Bremer K, Chase M, Fay M, Reveal J, Soltis D, et al. 2009. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III. Bot J Linn Soc. 161: 105-121.
Article
Brudno M, Do CB, Cooper GM, Kim MF, Davydov E, et al. NISC Comparative Sequencing Program2003. LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13: 721-731.
Article
PubMed
PMC
Castresana J. 2000. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Molr Biol Evol. 17: 540-552.
Article
Chatr-Aryamontri A, Breitkreutz B-J, Oughtred R, Boucher L, Heinicke S, Chen D, et al. 2015. The BioGRID interaction database: 2015 update. Nucleic Acids Res. 43: D470-D478.
Article
PubMed
Dalloul RA, Long JA, Zimin AV, Aslam L, Beal K, Blomberg LA, et al. 2010. Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis. PLoS Biol. 8: e1000475
PubMed
PMC
Du Z, Zhou X, Ling Y, Zhang Z, Su Z. 2010. agriGO: a GO analysis toolkit for the agricultural community. Nucleic Acids Res. 38: W64-W70.
Article
PubMed
PMC
Fletcher W, Yang Z. 2010. The effect of insertions, deletions, and alignment errors on the branch-site test of positive selection. Mol Biol Evol. 27: 2257-2267.
Article
PubMed
Hedges SB, Marin J, Suleski M, Paymer M, Kumar S. 2015. Tree of life reveals clock-like speciation and diversification. Mol Biol Evol. 32: 835-845.
Article
PubMed
PMC
Hurst LD. 2002. The Ka/Ks ratio: diagnosing the form of sequence evolution. Trends Genet. 18: 486
Article
PubMed
Imai H, Noda Y, Tamaoki M. 2015. Alteration of Arabidopsis SLAC1 promoter and its association with natural variation in drought tolerance. Plant Signal Behav. 10: e989761
Article
PubMed
Ji DL, Lin H, Chi W, Zhang LX. 2012. CpLEPA is critical for chloroplast protein synthesis under suboptimal conditions in Arabidopsis thaliana. PLoS One. 7: e49746
Article
PubMed
PMC
Katoh K, Kuma K, Toh H, Miyata T. 2005. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 33: 511-518.
Article
PubMed
PMC
Kim KW, Burt DW, Kim H, Cho S. 2013. Identification of differentially evolved genes: an alternative approach to detection of accelerated molecular evolution from genome-wide comparative data. Evol Bioinform Online. 9: 285-299.
Article
PubMed
PMC
PDF
Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 25: 1754-1760.
Article
PubMed
PMC
PDF
Löytynoja A, Goldman N. 2005. An algorithm for progressive multiple alignment of sequences with insertions. Proc Natl Acad Sci USA. 102: 10557-10562.
Article
PubMed
PMC
Mao F, Su Z, Olman V, Dam P, Liu Z, Xu Y. 2006. Mapping of orthologous genes in the context of biological pathways: An application of integer programming. Proc Natl Acad Sci USA. 103: 129-134.
Article
PubMed
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. 2010. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20: 1297-1303.
Article
PubMed
PMC
McPartland JM, Norris RW, Kilpatrick CW. 2007. Tempo and mode in the endocannaboinoid system. J Mol Evol. 65: 267-276.
Article
PubMed
PDF
Muller NA, Wijnen CL, Srinivasan A, Ryngajllo M, Ofner I, Lin T, et al. 2016. Domestication selected for deceleration of the circadian clock in cultivated tomato. Nat Genet. 48: 89-93.
Article
PubMed
PDF
Park JH, Suresh S, Cho GT, Choi NG, Baek HJ, Lee CW, et al. 2014. Assessment of molecular genetic diversity and population structure of sesame (Sesamum indicum L.) core collection accessions using simple sequence repeat markers. Plant Genet Resour. 12: 112-119.
Article
Roth C, Liberles DA. 2006. A systematic search for positive selection in higher plants (Embryophytes). BMC Plant Biol. 6: 12
Article
PubMed
PMC
Sabeti PC, Schaffner SF, Fry B, Lohmueller J, Varilly P, Shamovsky O, et al. 2006. Positive natural selection in the human lineage. Science. 312: 1614-1620.
Article
PubMed
Schwede T, Kopp J, Guex N, Peitsch MC. 2003. SWISS-MODEL: An automated protein homology-modeling server. Nucleic Acids Res. 31: 3381-3385.
Article
PubMed
PMC
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. 2003. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13: 2498-2504.
Article
PubMed
PMC
Wang L, Xia Q, Zhang Y, Zhu X, Zhu X, Li D, et al. 2016. Updated sesame genome assembly and fine mapping of plant height and seed coat color QTLs using a new high-density genetic map. BMC Genomics. 17: 31
Article
PubMed
PMC
Wang L, Yu S, Tong C, Zhao Y, Liu Y, Song C, et al. 2014. Genome sequencing of the high oil crop sesame provides insight into oil biosynthesis. Genome Biol. 15: R39
Article
PubMed
PMC
PDF
Wu F, Mueller LA, Crouzillat D, Pétiard V, Tanksley SD. 2006. Combining bioinformatics and phylogenetics to identify large sets of single-copy orthologous genes (COSII) for comparative, evolutionary and systematic studies: a test case in the euasterid plant clade. Genetics. 174: 1407-1420.
Article
PubMed
PMC
PDF
Yang J, Liu D, Wang X, Ji C, Cheng F, Liu B, et al. 2016. The genome sequence of allopolyploid Brassica juncea and analysis of differential homoeolog gene expression influencing selection. Nat Genet. 48: 1225-1232.
Article
PubMed
PDF
Yang Z. 2007. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 24: 1586-1591.
Article
PubMed
Yi DK, Kim KJ. 2012. Complete chloroplast genome sequences of important oilseed crop Sesamum indicum L. PLoS One. 7: e35872
Article
PubMed
PMC
Yi X, Du Z, Su Z. 2013. PlantGSEA: a gene set enrichment analysis toolkit for plant community. Nucleic Acids Res. 41: W98-W103.
Article
PubMed
PMC
Zhang S, Qi Y, Liu M, Yang C. 2013. SUMO E3 ligase AtMMS21 regulates drought tolerance in Arabidopsis thaliana. J Integr Plant Biol. 55: 83-95.
PubMed

Figure & Data

References

Citations

Citations to this article as recorded by

Cite

CITE

export

Copy Download

Format
XML Download

Download Citation

Download a citation file in RIS format that can be imported by all major citation management software, including EndNote, ProCite, RefWorks, and Reference Manager.

Format:

RIS — For EndNote, ProCite, RefWorks, and most other reference management software
BibTeX — For JabRef, BibDesk, and other BibTeX-specific software

Include:

Citation for the content below
Citation and abstract for the content below

Positively Selected Orthologous Genes Identified in Sesame (Sesamum indicum) by Deep Resequencing

Plant Breed. Biotech.. 2019;7(1):24-33. Published online March 1, 2019

DOI: https://doi.org/10.9787/PBB.2019.7.1.24

Download Citation

Download a citation file in RIS format that can be imported by all major citation management software, including EndNote, ProCite, RefWorks, and Reference Manager.

Format:

RIS — For EndNote, ProCite, RefWorks, and most other reference management software
bib — For JabRef, BibDesk, and other BibTeX-specific software

Include:

Citation for the content below
Citation and abstract for the content below

Positively Selected Orthologous Genes Identified in Sesame (Sesamum indicum) by Deep Resequencing

Plant Breed. Biotech.. 2019;7(1):24-33. Published online March 1, 2019

DOI: https://doi.org/10.9787/PBB.2019.7.1.24

Figure

Positively Selected Orthologous Genes Identified in Sesame (Sesamum indicum) by Deep Resequencing

Fig. 1 Branch model phylogenetic tree of 7 Pentapetalae plants and 5 Sesamum indicum accessions for the evolution analysis. Numbers in the brackets indicating the tested branches for adapted genes, (1) is for Sesamum indicum and (2)–(7) are for each core accessions. The grey color indicating the species tree.

Fig. 2 The location and distribution of all the accelerated orthologs and positively selected genes across the chromosomes of sesame. Each red or blue vertical bar represented one orthologs, and the red indicated the positively selected ones, and the blue indicated the other orthologs. LG01-LG13 represents the chromosomes, and the bold horizontal line indicates the length of each chromosome.

Fig. 3 Functional enrichment analysis of positively selected orthologs. Yellow bar is accelerated selected genes, blue bar is the Arabidopsis thaliana reference genome genes. GO enrichment analyses were performed using AgriGO (http://bioinfo.cau.edu.cn/agriGO/) with Arabidopsis thaliana as the background species.

Fig. 4 Cytoscape protein-protein interaction network of the positively selected genes using Arabidopsis thaliana as reference (data from BioGRID). The color from green to red indicating the dN/dS value ranging from low to high. Total 19 positively selected genes that with dN/dS value more than 1 were marked and displayed in Table 2.

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Positively Selected Orthologous Genes Identified in Sesame (Sesamum indicum) by Deep Resequencing

Table 1

Summary of the accessions and resequencing status based on the sesame reference genome.

Accessions	KNUS124	KNUS158	KNUS160	KNUS161	KNUS162
Name	Heukseon	SADOVO 1	Nevada	Ansan	Milsung
Ecotypes	S. indicum L.	S. indicum L.	S. indicum L.	S. indicum L.	S. indicum L.
Origin	Korea	Korea	Korea	Korea	Korea
Flower color	Light red	Light red	Pink	White	White
Total raw reads	51,928,276	68,070,228	60,951,160	53,119,468	63,775,198
Clean reads	42,036,751	53,923,699	49,859,367	42,786,007	50,639,996
Mapped reads	33,967,713	42,565,414	40,064,706	34,053,742	40,022,382
Mapping depth	17.297	22.129	20.875	17.833	20.82
Mapping rate (%)	80.8	78.94	80.36	79.59	79.03

Table 2

Positively selected orthologs that were associated in the network analysis.

Sesame	A. thaliana	P-value	FDR	dN/dS	Annotation
SIN_1016274	AT4G15545	0.010602	0.035579	7.05644	Transmembrane transporter activity; integral component of membrane
SIN_1013043	AT3G11460	0.00329	0.017394	4.79731	Zinc ion binding; mitochondrion; cytidine to uridine editing
SIN_1022318	AT2G18760	0.005225	0.022633	13.07842	DNA or RNA binding; response to stress; cellular processes
SIN_1006130	AT2G24220	3.49E-11	3.45E-09	5.48865	Purine nucleobase transport; integral component of membrane
SIN_1014290	AT1G55740	1.53E-28	3.03E-26	1.12951	Carbohydrate metabolic processes; cytoplasm; hydrolase activity
SIN_1023518	AT4G35910	0.000967	0.008707	6.29376	Transferase activity; lateral root formation; DNA or RNA binding; protein urmylation
SIN_1015526	AT1G63660	0.008183	0.030544	3.61322	GMP synthase (glutamine-hydrolyzing) activity; ATP binding
SIN_1024931	AT1G02160	0.000668	0.007786	999	Unknown molecular functions
SIN_1022581	AT3G15150	1.62E-07	0.000008	182.0115	Cytokinin-activated signaling pathway; SUMO transferase activity; ligase activity
SIN_1025029	AT2G44360	0.000897	0.008459	65.66277	Unknown molecular functions
SIN_1013015	AT2G36305	0.000834	0.008256	464.5285	CAAX-box protein maturation; hydrolase activity
SIN_1023551	AT3G13770	0.002761	0.017321	3.71444	Mitochondrion; zinc ion binding
SIN_1024502	AT3G45770	0.003338	0.017394	8.71415	Fatty acid biosynthetic process; oxidation-reduction process
SIN_1022837	AT2G02410	0.013785	0.042647	998.9973	Unknown molecular functions
SIN_1009102	AT1G09850	0.002858	0.017321	11.21159	Extracellular; hydrolase activity
SIN_1026256	AT1G12480	0.001418	0.010802	9.73025	Abscisic acid-activated signaling pathway
SIN_1002052	AT5G08650	0.004158	0.020584	159.029	GTPase activity; nucleotide binding
SIN_1022276	AT3G24660	0.011071	0.036536	19.87913	Protein serine/threonine kinase activity
SIN_1002603	AT3G55760	0.003092	0.017394	126.5731	Chloroplast stroma

Table 1 Summary of the accessions and resequencing status based on the sesame reference genome.

Table 2 Positively selected orthologs that were associated in the network analysis.

Articles

Page Path

Positively Selected Orthologous Genes Identified in Sesame (Sesamum indicum) by Deep Resequencing

Full Article

Abstract

INTRODUCTION

MATERIALS AND METHODS

RESULTS

DISCUSSION

Supplementary Information

ACKNOWLEDGEMENTS

ACKNOWLEDGEMENTS

References

Figure & Data

References

Citations

CITE

Download Citation

Format:

Include:

Figure

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Table 1

Table 2

ABOUT

BROWSE ARTICLES

EDITORIAL POLICIES

FOR CONTRIBUTORS