码迷,mamicode.com
首页 > 编程语言 > 详细

LeetCode-Repeated DNA Sequences (位图算法减少内存)

时间:2015-05-06 22:51:37      阅读:348      评论:0      收藏:0      [点我收藏+]

标签:

Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

 

 
用位图算法可以减少内存,代码如下:
int map_exist[1024 * 1024 / 32];
int map_pattern[1024 * 1024 / 32];

#define set(map,x) \
    (map[x >> 5] |= (1 << (x & 0x1F)))

#define test(map,x) \
    (map[x >> 5] & (1 << (x & 0x1F)))

int dnamap[26];

char** findRepeatedDnaSequences(char* s, int* returnSize) {
    *returnSize = 0;
    if (s == NULL) return NULL;
    int len = strlen(s);
    if (len <= 10) return NULL;

    memset(map_exist, 0, sizeof(int)* (1024 * 1024 / 32));
    memset(map_pattern, 0, sizeof(int)* (1024 * 1024 / 32));

    dnamap[A - A] = 0;  dnamap[C - A] = 1;
    dnamap[G - A] = 2;  dnamap[T - A] = 3;

    char ** ret = malloc(sizeof(char*));
    int curr = 0;
    int size = 1;
    int key;
    int i = 0;

    while (i < 9)
        key = (key << 2) | dnamap[s[i++] - A];
    while (i < len){
        key = ((key << 2) & 0xFFFFF) | dnamap[s[i++] - A];
        if (test(map_pattern, key)){
            if (!test(map_exist, key)){
                set(map_exist, key);
                if (curr == size){
                    size *= 2;
                    ret = realloc(ret, sizeof(char*)* size);
                }
                ret[curr] = malloc(sizeof(char)* 11);
                memcpy(ret[curr], &s[i-10], 10);
                ret[curr][10] = \0;
                ++curr;
            }

        }
        else{
            set(map_pattern, key);
        }
    }

    ret = realloc(ret, sizeof(char*)* curr);
    *returnSize = curr;
    return ret;
}

该算法用时 6ms 左右, 非常快

 

LeetCode-Repeated DNA Sequences (位图算法减少内存)

标签:

原文地址:http://www.cnblogs.com/jimmysue/p/4483357.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!