标签:
Repeated DNA Sequences
问题:
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
思路:
位操作得到hash
我的代码:
public class Solution { public List<String> findRepeatedDnaSequences(String s) { List<String> rst = new ArrayList<String>(); if(s == null || s.length() < 10) return rst; int len = s.length(); Set<Integer> set = new HashSet<Integer>(); for(int i = 0; i <= len-10; i++) { String substr = s.substring(i,i+10); Integer key = getHash(substr); if(set.contains(key)) { if(!rst.contains(substr)) rst.add(substr); } else { set.add(key); } } return rst; } public int getCode(char c) { switch(c) { case ‘A‘: return 0; case ‘C‘: return 1; case ‘G‘: return 2; default: return 3; } } public Integer getHash(String s) { int hash = 0; for(int i = 0; i < s.length(); i++) { hash = hash << 2 | getCode(s.charAt(i)); } return hash; } }
学习之处:
标签:
原文地址:http://www.cnblogs.com/sunshisonghit/p/4355337.html