码迷,mamicode.com
首页 > 其他好文 > 详细

列出给出序列的crispr 引物

时间:2015-09-15 00:00:38      阅读:396      评论:0      收藏:0      [点我收藏+]

标签:

手动寻找cripsr 引物比较麻烦,而现在有些网站可以完成这一任务,但是,用python 去实现它也很简单。以下是脚本:

 1 #!/usr/bin/python
 2 # list all crispr-target(20 bp + NGG)
 3 
 4 import re
 5 from Bio.Seq import reverse_complement
 6 
 7 genome_seq = open("seq.txt")
 8 crispr_list = open("crispr_list.txt", "a")
 9 
10 sequence = ""
11 for line in genome_seq:
12     sequence = sequence + line.rstrip("\n")
13 
14 def GC_number(seq):
15     num_G = seq.count(G)
16     num_C = seq.count(C)
17     GC_number = num_G + num_C
18     return GC_number
19 
20 for i in range(0, len(sequence)):
21     target = sequence[i:i+23]
22     
23     if re.search(r[ATCG]{21}GG$, target):
24     start = i + 1
25     end   = i + 20
26         target = "+ " + target + " from " + str(start) + " to " + str(end) +  "---" + "GC_number = " +  str(GC_number(target))
27         crispr_list.write(target + "\n")
28     #print(target)
29 
30 RC_sequence = reverse_complement(sequence)
31 
32 for i in range(0, len(RC_sequence)):
33     target = RC_sequence[i:i+23]
34     
35     if re.search(r[ATCG]{21}GG$, target):
36     start = i + 1
37     end   = i + 20
38         target = "- " + target + " from " + str(start) + " to " + str(end) +  "---" + "GC_number = " +  str(GC_number(target))
39         crispr_list.write(target + "\n")
40     #print(target)
41 
42 genome_seq.close()
43 crispr_list.close()

理论上,这正则表达式更直接和简单,但是以下代码的结果远远少于上面方法的结果:

1 dna = sequence
2 runs = re.findall(r[ATCG]{21}GG, dna)
3 for match in runs:
4     #print(str(match))

目前还不是很清楚原因。

 

列出给出序列的crispr 引物

标签:

原文地址:http://www.cnblogs.com/OA-maque/p/4808859.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!