In plain English, with a background explanation, can someone explain what that -C command does? It says something about canonical, but I'm just finding this to be too ambiguous. What does that even mean? What does "both strands" mean? Is it doing a kmer analysis on a generated compliment strand from the raw reads??
Perhaps from a biological perspective "Canonical" is misleading, but it's definitely the right terminology from a math / computer-science perspective. Thanks for sharing your perspective, though - I was not aware that it caused any confusion.
Ah is that where it comes from! :) Sorry, I'm a comic-book nerd who was never very good at maths, so maybe canonical was only non-obvious to me.
But the
--both-strands
flag makes it very clear, plus the man page I think makes it clear too. It's only thejellyfish count --help
I think which is confusing, because the description of-C
is:Maybe something like:
Rev.comp. mer if result is alphabetically sooner. Better reflects strand ambiguity.
Its a tough one to cram into < 100 char though! Maybe just put 'see man page', haha :)
Wouldn't not counting a k-mer fragment that has a matching reverse compliment already in the jellyfish hash table for k-mers mean that you would inadvertently skip sequences? What if a section of the genome "just so happens" to have a sequence that matches a reverse compliment of a k-mer already in the hash table?
Yes, there are lots of regions in the genome which are just
TTTTTTTTTTTTTTTTTT
for example, which will always get turned to AAA.., but the idea was that in situations where you don't know the strand (single end sequencing which maps to both strands just as well - or reads which haven't/can't be mapped to a genome for some reason) it makes no-sense to treat AAA- any differently from TTT-, so combining them into one makes sense.But again, in situations where you could map and then discern the strand, it would make more sense to rev. comp. what you know to be on the reverse strand and drop the few reads you are unsure of. But not everyone has this luxury...