Counting Kmers With Reverse Complement?
2
3
Entering edit mode
13.0 years ago
Bob ▴ 40

What is a good tool to count the number of k-mers (with k being > 30), where we say that the reverse complement of a k-mer is the same as the k-mer itself? In other words, I would want ACCT and AGGT to count as the same k-mer. Does tallymer do this?

Thanks.

counts • 5.8k views
ADD COMMENT
4
Entering edit mode
13.0 years ago

I like [?]jellyfish[?] for k-mer counting. Great for large genomes. It has a -C parameters that'll count both strands.

ADD COMMENT
1
Entering edit mode

Well, I do want to count both strands, but more importantly, I want to make sure that ACCT and AGGT is counted once and not twice. It doesn't appear that Jellyfish does that, but I could be mistaken.

ADD REPLY
0
Entering edit mode

@Bob. If you use the -C flag, then one occurrence of ACCT and one occurrence of AGGT would count as two occurrences of a ACCT/AGGT kmer. The output file will just show ACCT or AGGT with the count, but I'm not sure how it decides which to use.

ADD REPLY
1
Entering edit mode
13.0 years ago
Gww ★ 2.7k

I am assuming you want kmers on the opposite strands to be merged if they are derived from different strands. In other words if you had the sequence ACCTG you would have 4 4mers.

ACCT
CCTG
CAGG (reverse complement)
AGGT (reverse complement)

Resulting in the counts:

ACCT --> 2
CCTG --> 2

Since you are essentially reverse complementing the reverse complement of a kmer to "merge" their counts you can just count all the kmers on the forward strand and then double it.

Forward strand:

ATGG
TGGC
GGCC

Reverse complement: GGCCAT, and each kmer is then reverse complemented again:

GGCC --> GGCC
GCCA --> CGGT
CCAT --> ATGG

As you can see you get the exact same kmers.

ADD COMMENT

Login before adding your answer.

Traffic: 1604 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6