Entering edit mode
4.7 years ago
Saima
▴
10
I have used the solution posted here in the past for counting the abundance of unique sequences in multiple fasta files. Is there a better tool (memory-efficient and fast) for doing the counting for large queries (>100 million reads)? I don't have a reference genome for my samples, so I am trying to find an alignment-free approach for counting the abundance of such a large number of reads ( length <100 nt). Any suggestions will be much appreciated!
Do you have a transcriptome? I think this is the bare minimum you'll need. Also what it the organism?
These are sRNA-seq data, so I don't want to just map to the transcriptome. It's from a plant without a published genome, I am using a list of unique sequences and counting their abundance in different tissue samples.
Did you try assembling the sRNAs? I assume it won't do much but you will have a sort of reference you can map against with Salmon
Thanks, that's a good suggestion, I am also working on assembling the reads besides direct counting.