短序列组装Sequence Assembly（转载）

时间：2016-06-23 15:51:12 阅读：256 评论：0 收藏：0 [点我收藏+]

标签：

转载：http://blog.sina.com.cn/s/blog_4af3f0d20100fq5i.html

短序列组装（Sequence assembly）几乎是近年来next-generation sequencing最热门的话题。简单来说，就是把基因组长长的序列打断(shotgun sequencing)，因为我们不知道基因组整条序列是如何排列（成一条链，最后成为一条染色体）组合（如何区分不同染色体）的，而我们又无法实现一次把整条长序列完整测序（现在有单子测序可能是一个新的sunlight)。然后，我们通过算法，计算机的帮助，把这些短的序列组装起来成为一条完整有序的序列。
就好比我们有这样一句话：

it is just a hypothesis, so don‘t be seriously！

假设，我们现在不知道这句话到底是什么，就像我们有一个box，我们抽到一张纸，但没打开，我们把这张纸撕成pieces，当然可能还发生了变化，所有的空格和标点都消失了（魔术！）我们得到：

itis ypo stah the sodo eriou siss ju ntbes sly……

因为我们测了几次，为了增加覆盖度，这样我们能通过高覆盖度而提高置信度：

itis ypo stah the sodo eriou siss ju ntbes sly tis yopth sodon beser beser ssod iti sju……

另外，我们又发明了一种称作为paired-ends的序列测序方法，即两头定长，中间插入片段一定的序列，像这样：

iti*****ahyp sju*****pot the*****don sod*****ser bes*****sly ……

这样我们根据如下图的方法，我们可以把这句话拼回来：

itisjustahypothesissodontbeseriously

但它不是最终结果，我们根据我们的现有的语法习惯，我们给它们加上空格（gap)和标点（遗漏的关键东西），我们能够还原原话！

第一：介绍一下组装的方法：
方法一：对序列进行组装,如果是重测序,可以用MAQ进行组装：Map to reference genome
方法二：如果是对新物种进行(de novo)测序,用velvet进行组装：De novo assembly
第二：组装的原理和流程图：

方法一和方法二的区别是有无参考基因组（reference genome）：下面是有参考基因组的一个结果显示

Mapping short reads to a reference
Eland
aligner for Illumina data
alignment policies:
•?allows up to 2 mismatches/alignment
•?non-unique alignments are discarded
Maq
•?quality aware - takes seq quality into
account
•?allows non-unique alignments
Index methods
•?reference genome is loaded into active
memory as k-mers
•?very fast alignments
•?SOAP
•?Bowtie
SNP detection, paired-end mapping, RNA-seq, ChIP-seq, etc.

Analysis depends on application
Mapping to reference genome
•?useful for interrogating the “known” genome
•?RNA sequencing
•?ChIP sequencing
•?SNP detection (targeted and whole-genome)
•?methyl-seq
•?CNV detection (sometimes)
De novo assembly
•?no genome sequence
•?unbiased ascertainment of variation in
known genome by whole-genome reseq

第三：short reads alignment by MAQ

第四：velvet示意图：

通过上述两种方法可以完成高通量短序列数据的组装，但事实它并不简单，因为基因组中含有大量的重复序列（Repeats），多态性变异（Polymorphism），测序错误（Sequencing error)，这三个方面就是组装过程中出现组装错误的主要来源.

参考资料：http://blog.sina.com.cn/s/blog_4860086b0100dnos.html

http://seqanswers.com/forums/showthread.php?t=1024

短序列组装Sequence Assembly（转载）

标签：

原文地址：http://www.cnblogs.com/steamed-bread/p/5611058.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行