码迷,mamicode.com
首页 > 其他好文 > 详细

leetcode 187: Repeated DNA Sequences

时间:2015-02-09 08:15:56      阅读:199      评论:0      收藏:0      [点我收藏+]

标签:

Total Accepted: 1161 Total Submissions: 6887

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",Return:["AAAAACCCCC", "CCCCCAAAAA"].

[分析]

HASHMAP方法会EXCEED  SPACE LIMIT.

因为只有4个字母,所以可以创建自己的hashkey, 每两个BITS, 对应一个 incoming character. 超过20bit 即10个字符时, 只保留20bits.

[注意]

1. (hash<<2) + map.get(c)  符号优先级,  << 一定要括起来.


public class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        List<String> res = new ArrayList<String>();
        if(s==null || s.length() < 11) return res;
        int hash = 0;
        
        Map<Character, Integer> map = new HashMap<Character, Integer>();
        map.put('A', 0);
        map.put('C', 1);
        map.put('G', 2);
        map.put('T', 3);
        
        Set<Integer> set = new HashSet<Integer>();
        Set<Integer> unique = new HashSet<Integer>();
        
        for(int i=0; i<s.length(); i++) {
            char c = s.charAt(i);
            if(i<9) {
                hash = (hash<<2) + map.get(c);
            } else {
                hash = (hash<<2) + map.get(c);
                hash &= (1<<20) - 1;
                if( set.contains(hash) && !unique.contains(hash)) {
                    res.add(s.substring(i-9, i+1));
                    unique.add(hash);
                } else {
                    set.add(hash);
                }
            }
        }
        return res;
    }
}


leetcode 187: Repeated DNA Sequences

标签:

原文地址:http://blog.csdn.net/xudli/article/details/43666725

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!