码迷,mamicode.com
首页 > 其他好文 > 详细

Repeated DNA Sequences 【待解决】

时间:2015-04-11 19:22:05      阅读:98      评论:0      收藏:0      [点我收藏+]

标签:

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

思路:1.用map来存储字符序列。2.检查序列是否已经存在在map中。如果存在且count=1,就将序列添加到结果中

注意:map<string,int>会造成memory limits exceed,
解决方案1:将A,C,G,T替换成数字,但
map<int,int> 会造成int溢出,所以用map<long long,int>
解决方案2:bit manipulation【待做】

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
       //check validation
       vector<string> res;
       if(s.empty()) return res;
       //check special case
       int n=s.length();
       if(n<10) return res;
       
       //general case
       string sbit;
       for(int i=0;i<n;i++){
           if(s[i]==A) sbit+="0";
           else if(s[i]==C) sbit+="1";
           else if(s[i]==G) sbit+="2";
           else  sbit+="3";
       }
       unordered_map<long long,int> map;
       string subbit;
       string subs;
       int subi;
       for(int i=0;i<n-9;i++){
           subbit = sbit.substr(i,10);
           subi = stoll(subbit);
           subs = s.substr(i,10);
           if(map.count(subi) && map[subi]==1){
               res.push_back(subs);
           }
           map[subi]++;
       }
       return res;
    }
};

 





Repeated DNA Sequences 【待解决】

标签:

原文地址:http://www.cnblogs.com/renrenbinbin/p/4418102.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!