人工选择下家养动植物基因组的进化
文献类型:学位论文
作者 | 徐讯 |
学位类别 | 博士 |
答辩日期 | 2015-04 |
授予单位 | 中国科学院研究生院 |
授予地点 | 北京 |
导师 | 王文 |
关键词 | 水稻 驯化 全基因组测序 基因组变异 驯化相关基因 山羊 |
其他题名 | Genome Evolution of Domesticated Plants and Animals under Artificial Selection |
学位专业 | 遗传学 |
中文摘要 | 人工驯化是人类对野生动植物在人工选择条件下进行改造的过程,经过人类驯化的过程,野生动植物在多个方面的性状发生了显著的变化以满足人类对于这些动植物的要求。驯化是人类社会、文明发展的重要基础,也是农业发展必不可缺的环节。随着人类社会的进一步发展,特别是随着人口的不断增加和环境条件的改变,对于作物和牲畜等进行进一步的改良将至关重要。加深对驯化过程的认识将帮助我们高效地改良作物和牲畜,特别是从基因组角度阐释驯化过程的基因组基础,将帮助我们认识农艺性状的决定机制,进而帮助提高物种改良的效率。本人在博士期间的研究工作主要以家养动植物,特别是以水稻和山羊为主要研究对象,开展了其在人工驯化作用下基因组进化的规律研究。 水稻是禾本科稻属的物种,是当今重要的粮食作物。水稻的栽培有着悠久的历史,在水稻的栽培历史中,栽培稻很多方面的性状同野生稻相比都发生了很大的变化,这些变化让水稻更加符合农耕的要求,并让水稻逐渐成为亚洲主要的粮食作物之一。作物的遗传学研究中,探索栽培驯化过程的遗传学基础,将帮助揭示重要农艺性状的遗传学基础,为进一步的作物改良奠定基础。同时,随着第二代测序技术的发展,开展大规模的全基因组研究成为可能,这大大加速了遗传学研究的过程,通过进行大规模基因组数据的分析,一方面让全基因组水平上的遗传基础研究成为可能,另外一方面也将加速整个科研进程,并帮助相关的产业转化。在本论文的研究中,我们有代表性地选取了四十株栽培稻和二十五株野生稻,进行了全基因组测序。其中的五十株测序深度达到了15倍,很好地覆盖了全基因组。将这些测序数据分别同参考序列进行比对,我们鉴定出了这些品系在全基因组上的存在的差异,其中包括六百五十万个高可信度的单核苷酸多态性位点(single nucleotide polymorphism, SNP),八十万个小的插入删除(insertion and deletions, InDel),九万五千个大的结构性变异(structure variation, SV)和一千六百多个拷贝数变异(copy number variation, CNV),这构成了水稻基因组上全基因组的变异图谱。我们发现栽培水稻在全基因组水平上的多态性为5.4×10-3(π),其中粳稻多态性水平更低,为3.7×10-3,籼稻相对较高,为5.7×10-3。而野生稻的多态性则高达7.7×10-3,其中O. rufipogon为7.2×10-3,O. nivara为6.3×10-3。除了上述的变异之外,我们也将不能够比对到参考基因组上的徐丽进行了组装,在排除污染等影响下,鉴定出了1,415个在不同品系之间存在却没有包含在参考基因组上新基因。同时也鉴定出1,327个参考基因组上的基因在某些水稻品系中被丢失。这些新基因和基因丢失事件,从一个侧面反映出水稻基因组水平上的差异。 利用高准确度的SNP位点,我们首先进行了主成分分析(PCA),主成分分析中,四种主要的水稻类型(野生稻rufipogon,野生稻nivara,栽培稻粳稻japonica,栽培稻籼稻nivara)各自分开,表明基因组水平上能够区分四个群体。进而我们构建了被测序个体的系统进化树,从系统进化树上我们可以看出两种主要类型的栽培稻(粳稻和籼稻)之间差异明显,并且分别同两种野生稻(rufipogon和nivara)聚类到一起,说明两种栽培稻经历了不同的驯化过程,此进化树更加支持两次独立起源或者一次起源之后籼稻同nivara野生稻之间发生大规模基因交流的两种水稻驯化模型,群体结构分析也同样表明不同类型栽培稻的不同驯化过程。我们也分析了水稻基因组上的连锁情况,通过计算连锁不平衡,我们发现野生稻中连锁距离较短,两种野生稻基本在10Kb之内连锁就降低到了最强连锁的一半,而籼稻中这个距离是65Kb,粳稻更是达到了200Kb,这说明了栽培稻中进行数量性状位点比对(QTL mapping)可能能够达到的精度。 通过对比栽培水稻和野生水稻在不同的基因组区域的多态性差异,我们鉴定了在基因组上栽培水稻多态性显著低于野生水稻的区域,这些区域更加有可能是在栽培水稻驯化过程中受到了选择的,这些区域通过其他参数(FST)衡量同样也是栽培稻和野生稻之间存在较大差异的区域,这样我们分别在粳稻和籼稻中鉴定出了739个和750个候选的受选择区域。同时,包括sh4和prog1在内的一些已知驯化相关基因位于这些区域中,说明这些区域的确更有可能受到了选择。我们发现有73个基因位于两种栽培稻受选择区域中,这些基因可能决定了栽培种重要的性状,所以在两种栽培稻中同样地受到了选择。通过全基因组重测序,我们研究了水稻的驯化历史,通过数据分析也发现了一些可能同重要农艺性状相关的基因,这不仅仅从遗传学的角度阐释了水稻驯化和栽培水稻的性状改变的基因组基础,为后续的水稻研究积累了数据,也将加速水稻品种的进一步改良。 除了对水稻驯化进行深入的研究之外,我们也针对山羊展开了基因组研究,以期获得在植物驯化的基因组基础研究之外,探索动物驯化的相关基因组基础。山羊在全世界范围内被广泛饲养,尤其是在中国、印度和其他发展中国家。山羊是肉、奶、纤维、皮毛的重要来源。自从有人类文明以来,山羊在农业、经济、文化甚至宗教方面也扮演了重要的角色。虽然其重要性如此之高,由于缺乏参考基因组,目前对山羊的遗传育种研究还略显滞后,而山羊基因组测序对于遗传标记辅助育种、改善山羊的经济性状具有重要作用。尽管山羊在农业和生物上具有重要性,由于缺少其基因组参考序列,严重阻碍了其育种和遗传研究。此研究中采用Illumina下一代测序技术,获得了山羊基因组序列。在没有完善的全基因组物理图谱的情况下,本研究创新的使用了一种新的构建染色体物理图谱的方法,借助于全基因组酶切图谱Optical Mapping技术对基因组在染色体水平进行了拼接,最后结合序列拼接的结果整合形成了基因组完成图。基于注释的基因集,取已阐明羊绒产生机理的内蒙绒山羊的初级毛囊与次级毛囊的转录本进行比较分析,确认了驯化过程中快速进化的基因。 在水稻和山羊研究中的这些结果提示,基于全基因组测序技术使得我们能够更加清楚的了解人工驯化作用下家养动植物基因组进化的规律,有助于找出在人工驯化作用下受选择的与农艺性状相关的基因,这些规律和基因信息也许能进一步改善优良品种的选育。 |
英文摘要 | Domestication is the process of human selection and modification of wild plants and animals to make them suitable for human applications. Domestication serves as fundamental for developing of human society and agriculture. With further developing of human society, especially with the growing of human population and change of environments, further improving crops and livestock is quite important. To understand better about the mechanisms of domestication, especially to reveal the genetic fundamentals of domestication, would help us understand better about the important agricultural traits, thus to aid further improving these species in a more efficient way. Rice (Oryza sativa) is a grass species, which is one of the most important crops now in the world. Rice has been domesticated long time ago, and during domestication, it has undergone substantial phenotypic changes, making it more suitable for large scale cultivation. In the crop genetic studies, exploring the genetic fundamentals of domestication would help revealing the mechanism of important agricultural traits, and further aid improving the crop. In the meantime, with the rapid developing of the second generation sequencing technologies, large scale whole genome sequencing has been made possible. This would facilitate the whole genome genetic studies of different crops, accelerate the researches, and aid agriculture developing. In the study of thesis, we selected forty domesticated strains and twenty five wild strains to represent the rice population and we sequenced them in the whole genome. Fifty of these strains were sequenced at high depth (15×), thus these strains were well covered in the whole genome. Comparing the sequences of these strains to the reference genome, we identified genome wide variations of these strains, including ~6.5 million single nucleotide polymorphisms (SNPs), ~0.8 million small insertion and deletions (InDels), ~95 thousand structure variations (SVs) and 1.6 thousand copy number variations (CNVs). This was the most comprehensive genetic variation map of rice. We found that the genome wide diversity of cultivated rice was 5.4×10-3(π), with lower diversity in japonica (3.7×10-3) and higher diversity in indica (5.7×10-3). While wild rice had higher diversity level (7.7×10-3) than rice, with rufipogon to be 7.2×10-3 and nivara to be 6.3×10-3. In addition to the above variations, we also identified 1,415 novel genes by assembling the filtered unmapped sequences of these strains and 1,327 lost genes by extracting genes not covered by the sequencing reads. Using the identified high confidential SNPs, we first conducted the principle component analysis (PCA) to confirm the four major groups of rice (wild rice O. rufipogon and O. nivara, cultivated rice japonica and indica). Phylogenetic tree constructed using these SNPs showed the substantial differentiation between the two cultivated rice groups, with japonica to be more closely related to rufipogon and indica to be more closely related to nivara. This reflected different processes of cultivation of the two cultivated rice groups. And our result supported either the double origin model or the single origin model with substantial genetic flow between indica and nivara. Furthermore, the population structure analysis supported the complex history of rice domestication. We also assessed the linkage disequilibrium (LD) of different rice groups, to find short LD in wild rice (length of LD decayed to half the maximum to be less than 10 Kb) and long LD in cultivated rice (65 Kb for indica and 200 Kb for japonica). This indicated the resolution for quantitative trait locus mapping (QTL) studies in rice. Finally, comparing the diversity levels of cultivated rice to wild rice in the regions along the genome, we identified regions with substantial genetic diversity lost in cultivated rice comparing to wild rice, which should be the candidate regions under selection during domestication. These regions were also highly differentiated regions between cultivated and wild rice according to other parameters (fixation index, FST). In total, we identified 739 such regions in japonica and 750 regions in inidica. Well known domestication related genes, such as sh4 and prog1 were also included in those regions, supporting those regions to be under domestication. Through whole genome resequencing, we analyzed the rice domestication history, identified candidate regions/genes under domestication responsible for important agriculture phenotype changes. This study serves as an example for using next generation sequencing to study crops, and it accumulated the genome data for rice researches and rice breeding. In addition to rice genome research, we also carried out sheep and goat genomic researches to reveal the genomic changes during animal domestication. The domestic goat is widely reared throughout the world, especially in China, India and other developing countries. Goats serve as an important source of meat, milk, fiber and pelts, and have also fulfilled agricultural, economic, cultural and even religious roles since very early times in human civilization. Evidence indicates that the goat might have been domesticated from two wild Capris (Capra aegagrus and Capra falconeri) ~10,000 years ago within the Fertile Crescent, and then spread quickly following patterns of human migration and trade. Today, there are >1,000 goat breeds, and >830 million goats are kept around the world. In addition to their value as domestic animals, goats are now used as animal models for biomedical research, to inves- tigate the genetic basis of complex traits and in the transgene produc- tion of peptide medicines. Despite the agricultural and biological importance of goats, breeding and genetics studies have been hindered by the lack of a reference genome sequence. In this work, we combined Illumina next-generation sequencing technology and whole-genome mapping of large DNA molecules to obtain a genome sequence for the domestic goat. We then annotated the genome, and identified rapidly evolving genes. Furthermore, based on an annotated set of goat genes, we generated and compared transcriptome data from secondary hair follicles with data from primary hair follicles of the Inner Mongolia cashmere goat, shedding light on the genetic basis of the formation of cashmere fibers. |
语种 | 中文 |
源URL | [http://159.226.149.26:8080/handle/152453/10180] ![]() |
专题 | 昆明动物研究所_基因起源组 |
推荐引用方式 GB/T 7714 | 徐讯. 人工选择下家养动植物基因组的进化[D]. 北京. 中国科学院研究生院. 2015. |
入库方式: OAI收割
来源:昆明动物研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。