标签:
题目:
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
思路:
这个题上来打算用hashmap把所有的子序列都放进去,找相同的。是不是有点天真了。。
查了一下,可以用bit mask来做,恍然大悟。bit manipulation总是可以让代码变得很简洁。
直接看代码吧。也没什么巧妙地地方,只是用两个bit代表一个字母。
A:00
C:01
G: 10
T: 11
1 public List<String> findRepeatedDnaSequences(String s) { 2 List<String> res = new ArrayList<String>(); 3 if (s.length() <= 10) { 4 return res; 5 } 6 HashMap<Character, Integer> map = new HashMap<Character, Integer>(); 7 map.put(‘A‘, 0); 8 map.put(‘C‘, 1); 9 map.put(‘G‘, 2); 10 map.put(‘T‘, 3); 11 12 int mask = 0; 13 14 Set<Integer> subSeq = new HashSet<Integer>(); 15 Set<Integer> addedSubSeq = new HashSet<Integer>(); 16 17 for (int i = 0; i < s.length(); i++) { 18 if (i < 9) { 19 mask = mask << 2; 20 mask += map.get(s.charAt(i)); 21 }else{ 22 mask = mask << 2; 23 mask += map.get(s.charAt(i)); 24 25 //we only need 20 bits for 10 letter 26 mask = mask << 12; 27 mask = mask >>> 12; 28 if (subSeq.contains(mask) && !addedSubSeq.contains(mask)) { 29 res.add(s.substring(i - 9, i + 1)); 30 addedSubSeq.add(mask); 31 } else { 32 subSeq.add(mask); 33 } 34 } 35 } 36 return res; 37 }
标签:
原文地址:http://www.cnblogs.com/gonuts/p/4475203.html