[LeetCode]Repeated DNA Sequences，解题报告

时间：2015-03-14 12:30:04 阅读：201 评论：0 收藏：0 [点我收藏+]

前言

最近在LeetCode上能一次AC的概率越来越低了，我这里也是把每次不能一次AC的题目记录下来，把解题思路分享给大家。

题目

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = “AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT”,
Return:
[“AAAAACCCCC”, “CCCCCAAAAA”].

Native思路

看到这道题目，我的第一思路是：

For循环遍历字符串，每10个字符组成字串，存储在HashTable或者HashMap或者HashSet中。
每次往Hash数据结构中存入字串时，需要判断当前Hash数据结构中是否已经有该子串，如果有的话，则需要将该子串保存到List中，作为结果返回。

同时，为了追求效率，我选择了HashSet。因为，HashTable是线程同步的，在效率上肯定会有所降低。而HashMap则浪费了一部分存储空间。

有了思路，代码很容易写出来：

        List<String> resList = new ArrayList<String>();
        if (s == null || s.length() <= 10) {
            return resList;
        }

        Set<String> sets = new HashSet<String>();
        for (int i = 0; i <= s.length() - 10; i++) {
            String key = s.substring(i, i + 10);
            if (sets.contains(key) && ! resList.contains(key)) {
                resList.add(key);
            } else {
                sets.add(key);
            }
        }

        return resList;

但是结果不尽如人意，还是超时了。

技术分享

在LeetCode上做题目，有个好处是可以参与讨论。每当我不能第一时间想出方案的时候，我也会参考一下大家的讨论，就当练习英语了。通过这道题目的讨论，我们有了如下的二进制思路。

二进制思路

通过对Dicsuss讨论的学习，发现有人说上面native方法超时是因为字符串存储浪费了太多的空间和时间，因此可以考虑用整数存储，即二进制方法。这个思路非常简单，这里一共有四个字母：A，C，G，T。我们转换整数的思路如下：

A = 00，C = 01，G = 10，T = 11。
int key = 0, key = key << 2 | code(A|C|G|T)。

这样我们就很容易把一个字符串转换为整数了，上面公式不清楚的话，可以直接看转换代码：

    private static int hashCode(String str) {
        int hash = 0;

        for (int i = 0; i < str.length(); i ++) {
            hash = hash << 2 | mapInteger(str.charAt(i));
        }

        return hash;
    }

    private static int mapInteger(char ch) {
        switch (ch) {
        case ‘A‘:
            // 00
            return 0;
        case ‘C‘:
            // 01
            return 1;
        case ‘G‘:
            // 10
            return 2;
        case ‘T‘:
            // 11
            return 3;
        default :
            return 0;
        }
    }

知道了str如何转为整数，下面的思路和之前Native的思路就是一样的了，无非之前HashSet存储的是String，而现在存储的是Integer了。

AC

public class Solution {
    public static List<String> findRepeatedDnaSequences(String s) {
        List<String> resList = new ArrayList<String>();
        if (s == null || s.length() <= 10) {
            return resList;
        }

        Set<Integer> set = new HashSet<Integer>();
        for (int i = 0; i <= s.length() - 10; i ++) {
            String substr = s.substring(i, i + 10);
            int key = hashCode(substr);

            if (set.contains(key) && !resList.contains(substr)) {
                resList.add(substr);
            } else {
                set.add(key);
            }
        }

        return resList;
    }

    private static int hashCode(String str) {
        int hash = 0;

        for (int i = 0; i < str.length(); i ++) {
            hash = hash << 2 | mapInteger(str.charAt(i));
        }

        return hash;
    }

    private static int mapInteger(char ch) {
        switch (ch) {
        case ‘A‘:
            // 00
            return 0;
        case ‘C‘:
            // 01
            return 1;
        case ‘G‘:
            // 10
            return 2;
        case ‘T‘:
            // 11
            return 3;
        default :
            return 0;
        }
    }
}

[LeetCode]Repeated DNA Sequences，解题报告

标签：leetcode

原文地址：http://blog.csdn.net/wzy_1988/article/details/44224749

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行

[LeetCode]Repeated DNA Sequences，解题报告

目录

前言

题目

Native思路

二进制思路

AC