Repeated DNA Sequences

时间：2015-02-25 15:22:24 阅读：141 评论：0 收藏：0 [点我收藏+]

标签：

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

大致思路很简单，用一个hashmap来存储对应10个长度DNA的字符串及出现次数，最后将出现次数大于一次的存入list中，这里主要一个问题是map的key如果直接用字符串，会出现exceed time limit问题，必须将该DNA字符串hash成一个int型整数，A->00;C->01;G->10;T->11;这样一个10个字符长度的DNA序列映射成一个20位的2进制数，可将该2进制数作为key。代码如下：

public class Solution {
    
    //将字符转换成对应2位2进制数
    public int toInt(char c) {
        if(c==‘A‘) return 0;
        if(c==‘C‘) return 1;
        if(c==‘G‘) return 2;
        else return 3;
    }
    
    //将hashcode转换成DNA序列
    public String tostring(int n) {
        StringBuffer sb = new StringBuffer();
        for(int i=0;i<10;i++) {
            char c = ‘T‘;
            int temp = n%4;
            n = n>>2;
            if(temp==0) c = ‘A‘;
            if(temp==1) c = ‘C‘;
            if(temp==2) c = ‘G‘;
            sb.insert(0,c);
        }
        return sb.toString();
    }
    
    public List<String> findRepeatedDnaSequences(String s) {
        List<String> re = new ArrayList<String>();
        Map<Integer,Integer> map = new HashMap<Integer,Integer>();
        int size = s.length();
        if(size<=10) return re;
        int tmp = 0;
        for(int i=0;i<10;i++) {
            tmp = tmp<<2;
            tmp = tmp|toInt(s.charAt(i));
        }
        map.put(tmp,1);
        for(int j=10;j<size;j++) {
            tmp = ((tmp&0x3ffff)<<2)|toInt(s.charAt(j));//先讲最高2位置0在左移两位
            if(map.containsKey(tmp)) {
                map.put(tmp,map.get(tmp)+1);
            }
            else {
                map.put(tmp,1);
            }
        }
        
        Set<Integer> keys = map.keySet();
        for(Integer key:keys) {
            if(map.get(key)>1) re.add(tostring(key));
        }
        return re;
        
    }
}

Repeated DNA Sequences

标签：

原文地址：http://www.cnblogs.com/mrpod2g/p/4299559.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行