The bulk and the tail of minimal absent words in genome sequences
文献类型:期刊论文
作者 | Zhou, HJ![]() |
刊名 | PHYSICAL BIOLOGY
![]() |
出版日期 | 2016 |
卷号 | 13期号:2页码:26004 |
关键词 | Minimal Absent Words Copy-mutation Evolution Model Random Sequence |
DOI | http://dx.doi.org/10.1088/1478-3975/13/2/026004 |
英文摘要 | Minimal absent words (MAW) of a genomic sequence are subsequences that are absent themselves but the subwords of which are all present in the sequence. The characteristic distribution of genomic MAWs as a function of their length has been observed to be qualitatively similar for all living organisms, the bulk being rather short, and only relatively few being long. It has been an open issue whether the reason behind this phenomenon is statistical or reflects a biological mechanism, and what biological information is contained in absent words. In this work we demonstrate that the bulk can be described by a probabilistic model of sampling words from random sequences, while the tail of long MAWs is of biological origin. We introduce the concept of a core of a MAW, which are sequences present in the genome and closest to a given MAW. We show that in E. faecalis, E. coli and yeast the cores of the longest MAWs, which exist in two or more copies, are located in highly conserved regions the most prominent example being ribosomal RNAs. We also show that while the distribution of the cores of long MAWs is roughly uniform over these genomes on a coarse-grained level, on a more detailed level it is strongly enhanced in 3' untranslated regions (UTRs) and, to a lesser extent, also in 5' UTRs. This indicates that MAWs and associated MAW cores correspond to fine-tuned evolutionary relationships, and suggest that they can be more widely used as markers for genomic complexity. |
学科主题 | Biochemistry & Molecular Biology ; Biophysics |
语种 | 英语 |
源URL | [http://ir.itp.ac.cn/handle/311006/21704] ![]() |
专题 | 理论物理研究所_理论物理所1978-2010年知识产出 |
通讯作者 | Innocenti, N (reprint author), Hebrew Univ Jerusalem, Sch Comp Sci & Engn, IL-91904 Jerusalem, Israel. |
推荐引用方式 GB/T 7714 | Zhou, HJ,Aurell, E,Innocenti, N,et al. The bulk and the tail of minimal absent words in genome sequences[J]. PHYSICAL BIOLOGY,2016,13(2):26004. |
APA | Zhou, HJ,Aurell, E,Innocenti, N,&Innocenti, N .(2016).The bulk and the tail of minimal absent words in genome sequences.PHYSICAL BIOLOGY,13(2),26004. |
MLA | Zhou, HJ,et al."The bulk and the tail of minimal absent words in genome sequences".PHYSICAL BIOLOGY 13.2(2016):26004. |
入库方式: OAI收割
来源:理论物理研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。