标签:
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
滑动窗口,每次向后移动一个字符,因为直接存string会超出内存限制,所以转换为int
A -> 0
B -> 1
C -> 2
D -> 3
时间复杂度O(n),空间复杂度O(4^10)
public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
Map<Integer, Integer> map = new HashMap<>();
List<String> result = new ArrayList<>();
for (int i = 0; i < s.length() - 9; i++) {
int subStr = convertToInt(s, i, i + 10);
if (map.containsKey(subStr)) {
if (map.get(subStr) == 1) {
result.add(s.substring(i, i + 10));
map.put(subStr, map.get(subStr) + 1);
}
} else {
map.put(subStr, 1);
}
}
return result;
}
private int convertToInt(String s, int start, int end) {
int res = 0;
while (start < end) {
char c = s.charAt(start);
int v = 0;
switch(c) {
case ‘A‘: v = 0; break;
case ‘C‘: v = 1; break;
case ‘G‘: v = 2; break;
case ‘T‘: v = 3; break;
}
res = res << 2 | v;
start++;
}
return res;
}
}
标签:
原文地址:http://www.cnblogs.com/shini/p/4409175.html