标签:
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
int map_exist[1024 * 1024 / 32]; int map_pattern[1024 * 1024 / 32]; #define set(map,x) \ (map[x >> 5] |= (1 << (x & 0x1F))) #define test(map,x) \ (map[x >> 5] & (1 << (x & 0x1F))) int dnamap[26]; char** findRepeatedDnaSequences(char* s, int* returnSize) { *returnSize = 0; if (s == NULL) return NULL; int len = strlen(s); if (len <= 10) return NULL; memset(map_exist, 0, sizeof(int)* (1024 * 1024 / 32)); memset(map_pattern, 0, sizeof(int)* (1024 * 1024 / 32)); dnamap[‘A‘ - ‘A‘] = 0; dnamap[‘C‘ - ‘A‘] = 1; dnamap[‘G‘ - ‘A‘] = 2; dnamap[‘T‘ - ‘A‘] = 3; char ** ret = malloc(sizeof(char*)); int curr = 0; int size = 1; int key; int i = 0; while (i < 9) key = (key << 2) | dnamap[s[i++] - ‘A‘]; while (i < len){ key = ((key << 2) & 0xFFFFF) | dnamap[s[i++] - ‘A‘]; if (test(map_pattern, key)){ if (!test(map_exist, key)){ set(map_exist, key); if (curr == size){ size *= 2; ret = realloc(ret, sizeof(char*)* size); } ret[curr] = malloc(sizeof(char)* 11); memcpy(ret[curr], &s[i-10], 10); ret[curr][10] = ‘\0‘; ++curr; } } else{ set(map_pattern, key); } } ret = realloc(ret, sizeof(char*)* curr); *returnSize = curr; return ret; }
该算法用时 6ms 左右, 非常快
LeetCode-Repeated DNA Sequences (位图算法减少内存)
标签:
原文地址:http://www.cnblogs.com/jimmysue/p/4483357.html