Thursday, April 23, 2015

[LeetCode] Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

Solution: We use HashMap, but if we directly use strings of length 10 as keys, LeetCode OJ will throw the error "Memory Limit Exceeded". The main trick is to store an integer representation of a string of length 10. Since we only have 'A', 'C', 'G', 'T', we can use 2 bits to represent each of them, and a string of length 10 can be represented using 20 bits, within 32 bits. We use bit operation | to calculate such an integer representation. 
Alternatively, one may also use a hash table instead of a hash map. An array of size 2^20 is sufficient, since a string of length 10 can be represented using 20 bits.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
public class Solution {
    public int mapStringToInt(String str) {
        int result = 0;
        for (int i = 0; i < str.length(); i++) {
            switch (str.charAt(i)) {
                case 'A': result |= 0; break;
                case 'C': result |= 1; break;
                case 'G': result |= 2; break;
                case 'T': result |= 3; break;
                default: 
            }
            result <<= 2;
        }
        return result;
    }
    
    public List<String> findRepeatedDnaSequences(String s) {
        ArrayList<String> list = new ArrayList<String>();
        HashMap<Integer, Integer> map = new HashMap<Integer, Integer>();
        for (int i = 0; i < s.length() - 9; i++) {
            String str = s.substring(i, i + 10);
            int val = mapStringToInt(str);
            if (map.containsKey(val)) {
                if (map.get(val) == 1) {
                    list.add(str);
                    map.put(val, 0);
                }
            } else {
                map.put(val, 1);
            }
        }
        return list;
    }
}

No comments:

Post a Comment