0187. Repeated DNA Sequences (M)

时间：2020-06-16 10:26:45 阅读：64 评论：0 收藏：0 [点我收藏+]

标签：创建长度 nucleo time 整数 strong lse func int

Repeated DNA Sequences (M)

题目

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

Example:

Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"

Output: ["AAAAACCCCC", "CCCCCAAAAA"]

题意

给定一个字符串s，找出其中所有出现至少2次的长度为10的子串。

思路

比较直接的方法是使用两个HashSet去处理，一个保存已经遍历过的子串，另一个保存答案子串。

在此基础上可以使用位运算进行优化。分别用二进制的00、01、10、11来表示‘A‘、‘C‘、‘G‘、‘T‘，则一个长度为10的字符串就可以用一个长度为20的二进制数字来表示，每一次获取新的子串只需要将原来的二进制串左移2位，并将最低的两位换成新加入的字符，类似于滑动窗口的操作。其他步骤与HashSet方法相同。

代码实现

Java

HashSet

class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        Set<String> one = new HashSet<>();
        Set<String> two = new HashSet<>();

        for (int i = 0; i < s.length() - 9; i++) {
            String t = s.substring(i, i + 10);
            if (two.contains(t)) {
                continue;
            } else if (one.contains(t)) {
                two.add(t);
            } else {
                one.add(t);
            }
        }

        return new ArrayList<>(two);
    }
}

位运算优化

class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        if (s.length() < 10) {
            return new ArrayList<>();
        }

        Set<String> two = new HashSet<>();
        Set<Integer> one = new HashSet<>();			// key类型换成整数
        int[] hash = new int[26];
        hash[‘A‘ - ‘A‘] = 0;
        hash[‘C‘ - ‘A‘] = 1;
        hash[‘G‘ - ‘A‘] = 2;
        hash[‘T‘ - ‘A‘] = 3;

        int cur = 0;
      	// 创建初始的长度为9的子串
        for (int i = 0; i < 9; i++) {
            cur = cur << 2 | hash[s.charAt(i) - ‘A‘];
        }

        for (int i = 9; i < s.length(); i++) {
          	// 每次只需要保留低20位
            cur = cur << 2 & 0xfffff | hash[s.charAt(i) - ‘A‘];
            if (one.contains(cur)) {
                two.add(s.substring(i - 9, i + 1));
            } else {
                one.add(cur);
            }
        }

        return new ArrayList<>(two);
    }
}

0187. Repeated DNA Sequences (M)

标签：创建长度 nucleo time 整数 strong lse func int

原文地址：https://www.cnblogs.com/mapoos/p/13139514.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行