[LeetCode] Repeated DNA Sequences

时间：2015-04-10 15:35:15 阅读：242 评论：0 收藏：0 [点我收藏+]

Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

解题思路：

1、用map存储已经扫描过的子串，并对之计数。时间复杂度为O(n)。代码如下：

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        map<string, int> count;
        
        vector<string> result;
        
        int len = s.length();
        for(int i=0; i<len-10; i++){
            string str = s.substr(i, 10);
            map<string, int>::iterator it=count.find(str);
            if(it!=count.end()){
                count[str]=1;
            }else{
                if(it->second==1){
                    result.push_back(str);
                }
                count[str]++;
            }
        }
        
        return result;
    }
};

但是会报内存溢出错误。

2、为AGCT分别编码，共有4种，故只需两位便能编码。共十个字符，只需20位就能表示任何一种组合。int类型32位，因此可以用一个int类型来存储10个字符串。每检查一个字符之后，需要将最高位置为0。代码如下：

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        const int subStrLen = 10;
        const int mask = 0x3ffff;
        
        map<int, int> count;
        
        map<char, int> cCode;
        cCode['A']=0;
        cCode['C']=1;
        cCode['G']=2;
        cCode['T']=3;
        
        vector<string> result;
        
        int len = s.length();
        
        int code=0;
        
        if(len>subStrLen){
            string str = s.substr(0, subStrLen);
            for(int i=0; i<subStrLen; i++){
                code <<= 2;
                code |= cCode[str[i]];
            }
            count[code] = 1;
        }
        
        for(int i=subStrLen; i<len; i++){
            code &= mask;    //清空最高位
            code <<= 2;
            code |= cCode[s[i]];
            count[code]++;
            if(count[code]==2){
                result.push_back(s.substr(i-subStrLen + 1, subStrLen));
            }
        }
        
        return result;
    }
};

在leetCode中，我跑上面的代码需要269ms，看了很多其他人的时间，都比我快很多。不知各位有更好的方法没有。

[LeetCode] Repeated DNA Sequences

标签：c++ leetcode

原文地址：http://blog.csdn.net/kangrydotnet/article/details/44979491

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行