码迷,mamicode.com
首页 > 其他好文 > 详细

Repeated DNA Sequences

时间:2015-02-25 15:22:24      阅读:141      评论:0      收藏:0      [点我收藏+]

标签:

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

大致思路很简单,用一个hashmap来存储对应10个长度DNA的字符串及出现次数,最后将出现次数大于一次的存入list中,这里主要一个问题是map的key如果直接用字符串,会出现exceed time limit问题,必须将该DNA字符串hash成一个int型整数,A->00;C->01;G->10;T->11;这样一个10个字符长度的DNA序列映射成一个20位的2进制数,可将该2进制数作为key。代码如下:
public class Solution {
    
    //将字符转换成对应2位2进制数
    public int toInt(char c) {
        if(c==‘A‘) return 0;
        if(c==‘C‘) return 1;
        if(c==‘G‘) return 2;
        else return 3;
    }
    
    //将hashcode转换成DNA序列
    public String tostring(int n) {
        StringBuffer sb = new StringBuffer();
        for(int i=0;i<10;i++) {
            char c = ‘T‘;
            int temp = n%4;
            n = n>>2;
            if(temp==0) c = ‘A‘;
            if(temp==1) c = ‘C‘;
            if(temp==2) c = ‘G‘;
            sb.insert(0,c);
        }
        return sb.toString();
    }
    
    public List<String> findRepeatedDnaSequences(String s) {
        List<String> re = new ArrayList<String>();
        Map<Integer,Integer> map = new HashMap<Integer,Integer>();
        int size = s.length();
        if(size<=10) return re;
        int tmp = 0;
        for(int i=0;i<10;i++) {
            tmp = tmp<<2;
            tmp = tmp|toInt(s.charAt(i));
        }
        map.put(tmp,1);
        for(int j=10;j<size;j++) {
            tmp = ((tmp&0x3ffff)<<2)|toInt(s.charAt(j));//先讲最高2位置0在左移两位
            if(map.containsKey(tmp)) {
                map.put(tmp,map.get(tmp)+1);
            }
            else {
                map.put(tmp,1);
            }
        }
        
        Set<Integer> keys = map.keySet();
        for(Integer key:keys) {
            if(map.get(key)>1) re.add(tostring(key));
        }
        return re;
        
    }
}

 

Repeated DNA Sequences

标签:

原文地址:http://www.cnblogs.com/mrpod2g/p/4299559.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!