Hi every body,
I am trying to count the number of reads aligned to rRNA. The genome that I am working on it (Salmonella) has seven copies of rRNA genes. So it is normal that reads that relate to rRNA are aligned to multiple loci on the genome. I just want to know the percentage of reads that are aligned to rRNA so I have to count each of these reads only for one time whereas tools like HTSeq that are regularly used for counting reads are not able to do that. For example when I tell HTSeq to count non uniquely aligned reads, It counts them for multiple times and the final total number of reads are more than the total reads that I have in my sample.
Does any one know any appropriate tool for doing this? Thanks a lot
Thanks for your response. Could you please be more specific. which tools I can use to do this type of task and what sort of commands are useful? How can I tell HTSeq to count each read only for one time? Best
You don't tell it to htseq, you filter the SAM/BAM file that you provide to it according to your needs. To filter it, best thing is to use
samtools view
: through the links I provided you, you can get to know which bitwise flags are the ones representing the reads that you want to keep or to remove, and you can then build your samtools view command accordingly. For example:Imagine you want to keep only proper pairs (that is, read pairs that map in correct orientation and within insert size). Through the "SAM Flags" link you can understand which flags represent that (the flag in this case is 0x2). Samtools view has an argument which is
-f
which tells the program to include in the output file only the reads with the bitwise flag you specify afterwards. So if you run:samtools view -h -b -f 0x2 file.bam > out.bam
you will have in your output only the reads with the 0x2 bitwise flag, i.e. the proper pairs.In case of what you want, I have no idea of which flag you have to use. You should spend some time reading through!