码迷,mamicode.com
首页 > 其他好文 > 详细

Repeated DNA Sequences 解答

时间:2015-09-15 07:01:19      阅读:128      评论:0      收藏:0      [点我收藏+]

标签:

Question

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].

Solution -- Bit Manipulation

Original idea is to use a set to store each substring. Time complexity is O(n) and space cost is O(n). But for details of space cost, a char is 2 bytes, so we need 20 bytes to store a substring and therefore (20n) space.

If we represent DNA substring by integer, the space is cut down to  (4n).

 1 public List<String> findRepeatedDnaSequences(String s) {
 2     List<String> result = new ArrayList<String>();
 3  
 4     int len = s.length();
 5     if (len < 10) {
 6         return result;
 7     }
 8  
 9     Map<Character, Integer> map = new HashMap<Character, Integer>();
10     map.put(‘A‘, 0);
11     map.put(‘C‘, 1);
12     map.put(‘G‘, 2);
13     map.put(‘T‘, 3);
14  
15     Set<Integer> temp = new HashSet<Integer>();
16     Set<Integer> added = new HashSet<Integer>();
17  
18     int hash = 0;
19     for (int i = 0; i < len; i++) {
20         if (i < 9) {
21             //each ACGT fit 2 bits, so left shift 2
22             hash = (hash << 2) + map.get(s.charAt(i)); 
23         } else {
24             hash = (hash << 2) + map.get(s.charAt(i));
25             //make length of hash to be 20
26             hash = hash &  (1 << 20) - 1; 
27  
28             if (temp.contains(hash) && !added.contains(hash)) {
29                 result.add(s.substring(i - 9, i + 1));
30                 added.add(hash); //track added
31             } else {
32                 temp.add(hash);
33             }
34         }
35  
36     }
37  
38     return result;
39 }

 

Repeated DNA Sequences 解答

标签:

原文地址:http://www.cnblogs.com/ireneyanglan/p/4809078.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!