码迷,mamicode.com
首页 > 其他好文 > 详细

Repeated DNA Sequences

时间:2015-02-06 16:30:24      阅读:142      评论:0      收藏:0      [点我收藏+]

标签:

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].


这题其实挺简单的:
1.[AAAAAAAAAAAA] 也有[AAAAAAAAAA]作为return。 substrings之间可以overlap;
2.其实就是怎么对一个length=10的string找一个hashcode。(如果hashmap 以string 为key, 则out of memory)
再就是 如果每次去算hashcode的时候 用substring 算 也会ofm。。。。

 1 public class Solution {
 2     public List<String> findRepeatedDnaSequences(String s) {
 3         List<String> result = new ArrayList<String>();
 4         if(s == null || s.length() < 10) return result;
 5         HashMap<Integer, Integer> map = new HashMap<Integer, Integer>();
 6         Integer val = 0;
 7         for(int i = 0; i < 10; i ++){
 8             val = val << 2;
 9             val |= toInt(s.charAt(i));
10         }
11         map.put(val, 1);
12         for(int i = 10; i < s.length(); i ++){
13             val = ((val & 0x3ffff) << 2) | toInt(s.charAt(i));
14             if(map.containsKey(val)) map.put(val, map.get(val) + 1);
15             else map.put(val, 1);
16         }
17         for(Integer v : map.keySet())
18             if(map.get(v) > 1) result.add(toDNA(v));
19         return result;
20     }
21     
22     private Integer toInt(char c){
23         if(c == ‘A‘) return 0;
24         else if(c == ‘C‘) return 1;
25         else if(c== ‘G‘) return 2;
26         else return 3;//T
27     }
28     
29     private String toDNA(Integer i){
30         StringBuilder sb = new StringBuilder();
31         for(int j = 0; j < 10; j ++){
32             int tmp = i % 4;
33             i = i / 4;
34             char c = ‘T‘;
35             if(tmp == 0) c = ‘A‘;
36             else if(tmp == 1) c = ‘C‘;
37             else if(tmp == 2) c =‘G‘;
38             sb.insert(0, c);
39         }
40         return sb.toString();
41     }
42 }

 



Repeated DNA Sequences

标签:

原文地址:http://www.cnblogs.com/reynold-lei/p/4277311.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!