标签:
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
Return: ["AAAAACCCCC", "CCCCCAAAAA"].
思路:将字符串中所有长度为10的子串以及出现的次数用map保存,但是需要消耗很大的空间。
考虑到只有4中可能的字符A,C,G,T;可以对字符进行编码,用2bit来表示一个字符,一个含有10个字符的子串只要20bit就能表示,用一个int类型就能表示。
总长度为n的字符串,可能的子串共有n-9种,因此最多用n-9个int就能表示所有的字符组合。最坏的情况下,20bit共有2^20中组合,即1024*1024,
一个int类型4byte,因此额外消耗4MB的二外空间。
代码如下:
public List<String> findRepeatedDnaSequences(String s) { List<String> list = new ArrayList<String>(); if(s.length() < 10) return list; Map<Integer, Integer> map = new HashMap<Integer, Integer>(); for(int i=10; i<=s.length(); i++) { int result = 0; for(int j=i-10, k=0; j<i; j++,k++) { char c = s.charAt(j); int num = 0; switch(c) { case ‘A‘: num = 0; break; case ‘C‘: num = 1; break; case ‘G‘: num = 2; break; case ‘T‘: num = 3; break; } result += (num << 2*(9-k)); } if(map.containsKey(result) && map.get(result) == 0) { list.add(s.substring(i-10, i)); map.put(result, 1); } else if(!map.containsKey(result)) map.put(result, 0); } return list; }
LeetCode-187 Repeated DNA Sequences
标签:
原文地址:http://www.cnblogs.com/linxiong/p/4442998.html