Hello,
I have around 12,000 smallRNA reads (sequences) for which I need to find the total mapped reads for EACH individual sequence. They are in a collapsed fasta file ensuring the sequences appear only once in the entire file (trimming and quiality filtering etc have been already done ). I want to know the total number of reads ( counts) for each sequence, meaning if sequence A maps to multiple places- I will need the output for the total number of mapped reads for sequence A and so on.... Eventually I want to generate an expression profile for these sequences ( may be the top 50 ). I had previously used miRdeep2 tool and the quantifier.pl script for that suite was very helpful at that time. However these are not miRNAs and so miRdeep2 cannot be used here.
I can map these to the reference genome and generate a BAM file. But will samtools be of help in this situation ? I am a bit confused. Any help is most appreciated.
Thanks !
Your post is a bit hard to understand (for me). Are you asking to find out how many places do those smallRNA reads (each read = # of places) map to a reference genome?
This sounds like a normal count based workflow one would use (align (allow unlimited multi-mapping?) to a reference and then count using featureCounts).
Yes @genomax. you are correct. I want to know for every sequence how many different places it can map to and then report the total number of mapped hits for that sequence. This eventually translates into total number of mapped reads for that sequence.
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized.How long are these fasta sequences? If they are long (>500 bp), you may be able to use BBMap (
mapPacBio.sh
) suite to do your mapping (useambig=all
to get all possible locations the reads can map to, adjust other options to adjust alignment stringency as needed). Then you could count how many times a read appears in your BAM file to get those counts.Oops- apologies for posting incorrectly. Thanks for the suggestion. The sequences in the fasta file are between 22-30 nucleotides in length. I am going to try out featureCounts and see how that goes.
Thanks !