码迷,mamicode.com
首页 > 其他好文 > 详细

LeetCode Repeated DNA Sequences

时间:2015-04-19 22:46:06      阅读:224      评论:0      收藏:0      [点我收藏+]

标签:

 1 class Solution {
 2 private:
 3     char tbl[256];
 4 public:
 5     vector<string> findRepeatedDnaSequences(string s) {
 6         vector<string> res;
 7         
 8         int len = s.size();
 9         if (len < 10) {
10             return res;
11         }
12         vector<bool> exist(1<<20, false);
13         vector<bool> add(1<<20, false);
14         
15         tbl[A] = 0x00;
16         tbl[C] = 0X01;
17         tbl[G] = 0x02;
18         tbl[T] = 0x03;
19         
20         int mask= (1<<20) - 1;
21         int pattern = 0;
22         
23         for (int i=0; i<10; i++) {
24             pattern = mask & ((pattern << 2) | tbl[s[i]]);
25         }
26         exist[pattern] = true;
27         
28         for (int i=10; i<len; i++) {
29             int start = i - 10 + 1;
30             pattern = mask & ((pattern << 2) | tbl[s[i]]);
31             if (exist[pattern] && !add[pattern]) {
32                 res.push_back(s.substr(start, 10));
33                 add[pattern] = true;
34             } else {
35                 exist[pattern] = true;
36             }
37         }
38         return res;
39     }
40 };

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

将10个连续的DNA碱基序列看着是一个10位4进制的数,这样的数共有4^10=2^20个。用两个vector<bool>来分别表示,是否存在,是否已经添加到结果中即可。

LeetCode Repeated DNA Sequences

标签:

原文地址:http://www.cnblogs.com/lailailai/p/4440180.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!