Description
The DNA sequence is composed of a series of nucleotides abbreviated as 'A'
, 'C'
, 'G'
, and 'T'
.
- For example,
"ACGAATTCCG"
is a DNA sequence.
When studying DNA, it is useful to identify repeated sequences within the DNA.
Given a string s
that represents a DNA sequence, return all the 10
-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.
Examples:
Example 1:
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC","CCCCCAAAAA"]
Example 2:
Input: s = "AAAAAAAAAAAAA"
Output: ["AAAAAAAAAA"]
Solution in Python
Python
from typing import List
from collections import defaultdict
class Solution:
def findRepeatedDnaSequences(self, s: str) -> List[str]:
# Initialize a dictionary to count occurrences of each 10-letter sequence
sequence_count = defaultdict(int)
# List to store the sequences that are repeated
repeated_sequences = []
# Iterate through the string, extracting each 10-letter sequence
for i in range(len(s) - 9):
# Extract the current 10-letter sequence
sequence = s[i:i+10]
# Increase the count of this sequence in the dictionary
sequence_count[sequence] += 1
# If the count becomes 2, it means this sequence has been seen more than once
# Add it to the result list (only add once)
if sequence_count[sequence] == 2:
repeated_sequences.append(sequence)
# Return the list of repeated sequences
return repeated_sequences
Explanation:
- Imports and Initialization:
from typing import List
: Import the List type hint from thetyping
module.from collections import defaultdict
: Importdefaultdict
from thecollections
module to simplify counting occurrences.
- Class and Method Definition:
class Solution:
: Define a class namedSolution
.def findRepeatedDnaSequences(self, s: str) -> List[str]:
: Define a methodfindRepeatedDnaSequences
that takes a strings
and returns a list of strings.
- Counting Sequences:
sequence_count = defaultdict(int)
: Initialize a defaultdict to store and count each 10-letter sequence.repeated_sequences = []
: Initialize an empty list to store sequences that occur more than once.
- Iterating Through the String:
for i in range(len(s) - 9):
: Iterate over the strings
, stopping atlen(s) - 9
to ensure we can extract a 10-letter sequence.sequence = s[i:i+10]
: Extract the current 10-letter sequence starting at indexi
.
- Updating the Count and Checking for Repeats:
sequence_count[sequence] += 1
: Increase the count of this sequence in the dictionary.if sequence_count[sequence] == 2:
: If the count becomes 2, it means this sequence has been seen more than once, so we add it to the result list.
- Return the Result:
return repeated_sequences
: Return the list of sequences that occurred more than once.
Time Complexity:
- The time complexity is O(n), where n is the length of the string
s
. This is because we iterate over the string once and perform constant-time operations for each 10-letter sequence.
Space Complexity:
- The space complexity is O(n) in the worst case, where every possible 10-letter sequence is unique and stored in the dictionary.
This code efficiently finds all 10-letter-long sequences that occur more than once in the given DNA sequence string s
.
Solution in Javascript
JavaScript
/**
* @param {string} s
* @return {string[]}
*/
var findRepeatedDnaSequences = function(s) {
// Dictionary to keep track of occurrences of each 10-letter sequence
const sequenceCount = new Map();
// Array to store sequences that are repeated
const repeatedSequences = [];
// Loop through the string and extract each 10-letter sequence
for (let i = 0; i <= s.length - 10; i++) {
// Extract the current 10-letter sequence
const sequence = s.substring(i, i + 10);
// If the sequence already exists in the map, increment its count
if (sequenceCount.has(sequence)) {
sequenceCount.set(sequence, sequenceCount.get(sequence) + 1);
} else {
// If the sequence is new, add it to the map with a count of 1
sequenceCount.set(sequence, 1);
}
// If the count becomes 2, it means this sequence has been seen more than once
// We only add it to the result array the first time this condition is true
if (sequenceCount.get(sequence) === 2) {
repeatedSequences.push(sequence);
}
}
// Return the array of repeated sequences
return repeatedSequences;
};
Solution in Java
Java
class Solution {
public List<String> findRepeatedDnaSequences(String s) {
// Map to keep track of occurrences of each 10-letter sequence
Map<String, Integer> sequenceCount = new HashMap<>();
// List to store sequences that are repeated
List<String> repeatedSequences = new ArrayList<>();
// Iterate through the string and extract each 10-letter sequence
for (int i = 0; i <= s.length() - 10; i++) {
// Extract the current 10-letter sequence
String sequence = s.substring(i, i + 10);
// If the sequence already exists in the map, increment its count
sequenceCount.put(sequence, sequenceCount.getOrDefault(sequence, 0) + 1);
// If the count becomes 2, it means this sequence has been seen more than once
// Add it to the result list (only add once when count is exactly 2)
if (sequenceCount.get(sequence) == 2) {
repeatedSequences.add(sequence);
}
}
// Return the list of repeated sequences
return repeatedSequences;
}
}
Solution in C#
C#
using System;
using System.Collections.Generic;
public class Solution {
public IList<string> FindRepeatedDnaSequences(string s) {
// Dictionary to keep track of occurrences of each 10-letter sequence
Dictionary<string, int> sequenceCount = new Dictionary<string, int>();
// List to store sequences that are repeated
IList<string> repeatedSequences = new List<string>();
// Iterate through the string and extract each 10-letter sequence
for (int i = 0; i <= s.Length - 10; i++) {
// Extract the current 10-letter sequence
string sequence = s.Substring(i, 10);
// If the sequence already exists in the dictionary, increment its count
if (sequenceCount.ContainsKey(sequence)) {
sequenceCount[sequence]++;
} else {
// If the sequence is new, add it to the dictionary with a count of 1
sequenceCount[sequence] = 1;
}
// If the count becomes 2, it means this sequence has been seen more than once
// Add it to the result list (only add once)
if (sequenceCount[sequence] == 2) {
repeatedSequences.Add(sequence);
}
}
// Return the list of repeated sequences
return repeatedSequences;
}
}