标签:
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
问题:给定一个字符串序列,代表 DNA 序列,求其中有重复出现的长度为 10 的子序列。
题目中的例子都是不重叠的重复字串,实际上相互重叠的字串也是要统计进去,例如11位的 "AAAAAAAAAA" 就包含两个长度为 10 的"AAAAAAAAAA" 的重复子序列。这一点是题目没有说清楚的。
明确题目后,实现思路也比较简单:
1 /** 2 * 重复子字符串 可以重叠。 3 */ 4 vector<string> findRepeatedDnaSequences(string s) { 5 unordered_set<string> res; 6 7 unordered_map<string, int> ss_cnt; 8 9 int len = 10; 10 11 for (int i = 0; i + len -1 < s.size(); i++) { 12 string str = s.substr(i, len); 13 ss_cnt[str]++; 14 } 15 16 int i = 0 ; 17 while (i + len - 1 < s.size()) { 18 19 string cur = s.substr(i, len); 20 ss_cnt[cur]--; 21 22 if (ss_cnt[cur] > 0) { 23 res.insert(cur); 24 } 25 26 ss_cnt[cur]++; 27 i++; 28 } 29 30 vector<string> result; 31 32 unordered_set<string>::iterator s_iter; 33 for (s_iter = res.begin(); s_iter != res.end(); s_iter++) { 34 result.push_back(*s_iter); 35 } 36 37 return result; 38 }
[LeetCode] 187. Repeated DNA Sequences 解题思路
标签:
原文地址:http://www.cnblogs.com/TonyYPZhang/p/5140863.html