LeetCode-187 Repeated DNA Sequences

时间：2015-04-21 00:25:31 阅读：138 评论：0 收藏：0 [点我收藏+]

标签：

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:　　["AAAAACCCCC", "CCCCCAAAAA"].

思路：将字符串中所有长度为10的子串以及出现的次数用map保存，但是需要消耗很大的空间。

考虑到只有4中可能的字符A,C,G,T;可以对字符进行编码，用2bit来表示一个字符，一个含有10个字符的子串只要20bit就能表示，用一个int类型就能表示。

总长度为n的字符串，可能的子串共有n-9种，因此最多用n-9个int就能表示所有的字符组合。最坏的情况下，20bit共有2^20中组合，即1024*1024，

一个int类型4byte,因此额外消耗4MB的二外空间。

代码如下：

public List<String> findRepeatedDnaSequences(String s) {
        List<String> list = new ArrayList<String>();
        if(s.length() < 10)    return list;
        Map<Integer, Integer> map = new HashMap<Integer, Integer>();
        for(int i=10; i<=s.length(); i++) {
            int result = 0;
            for(int j=i-10, k=0; j<i; j++,k++) {
                char c = s.charAt(j);
                int num = 0;
                switch(c) {
                    case ‘A‘: num = 0; break;
                    case ‘C‘: num = 1; break;
                    case ‘G‘: num = 2; break;
                    case ‘T‘: num = 3; break;
                }
                result += (num << 2*(9-k));
            }
            if(map.containsKey(result) && map.get(result) == 0) {
                list.add(s.substring(i-10, i));
                map.put(result, 1);
            } else if(!map.containsKey(result))
                map.put(result, 0);
        }
        return list;
    }

LeetCode-187 Repeated DNA Sequences

标签：

原文地址：http://www.cnblogs.com/linxiong/p/4442998.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行