Repeated DNA Sequences 解答

时间：2015-09-15 07:01:19 阅读：128 评论：0 收藏：0 [点我收藏+]

标签：

Question

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].

Solution -- Bit Manipulation

Original idea is to use a set to store each substring. Time complexity is O(n) and space cost is O(n). But for details of space cost, a char is 2 bytes, so we need 20 bytes to store a substring and therefore (20n) space.

If we represent DNA substring by integer, the space is cut down to (4n).

 1 public List<String> findRepeatedDnaSequences(String s) {
 2     List<String> result = new ArrayList<String>();
 3  
 4     int len = s.length();
 5     if (len < 10) {
 6         return result;
 7     }
 8  
 9     Map<Character, Integer> map = new HashMap<Character, Integer>();
10     map.put(‘A‘, 0);
11     map.put(‘C‘, 1);
12     map.put(‘G‘, 2);
13     map.put(‘T‘, 3);
14  
15     Set<Integer> temp = new HashSet<Integer>();
16     Set<Integer> added = new HashSet<Integer>();
17  
18     int hash = 0;
19     for (int i = 0; i < len; i++) {
20         if (i < 9) {
21             //each ACGT fit 2 bits, so left shift 2
22             hash = (hash << 2) + map.get(s.charAt(i)); 
23         } else {
24             hash = (hash << 2) + map.get(s.charAt(i));
25             //make length of hash to be 20
26             hash = hash &  (1 << 20) - 1; 
27  
28             if (temp.contains(hash) && !added.contains(hash)) {
29                 result.add(s.substring(i - 9, i + 1));
30                 added.add(hash); //track added
31             } else {
32                 temp.add(hash);
33             }
34         }
35  
36     }
37  
38     return result;
39 }

Repeated DNA Sequences 解答

标签：

原文地址：http://www.cnblogs.com/ireneyanglan/p/4809078.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行