Question

How to count the number of reads aligned to rRNA

1

Entering edit mode

5.0 years ago

Javad ▴ 150

Dear all,

I am working on an organism that has 10 copies of rRNA genes on its genome. I would like to see the percentage of reads that are aligned to ribosomal RNA.

After alignment, rRNA reads align to every and each of these genes and if I perform the counting for reads that are not uniquely aligned to the genome, each one of these reads are counted 10 times. How can I count each read only once?

Any idea is highly appreciated.

Thanks

RNA-Seq • 2.9k views

ADD COMMENT • link 5.0 years ago by Javad ▴ 150

0

Entering edit mode

How can I count each read only once?

Extract reads mapped to the rRNA. Do a grep "^@SEQUENCER_ID" | sort | unique to get the unique Fastq ID's. Find the % comparing to total number of reads.

Not sure what aligner you used but if you want to align a multi-mapping read only once then you can use bbmap.sh option ambig=random. This will select one top scoring site randomly allowing you to count a read only one time.

ambiguous=best          (ambig) Set behavior on ambiguously-mapped reads (with
                        multiple top-scoring mapping locations).
                        best    (use the first best site)
                        toss    (consider unmapped)
                        random  (select one top-scoring site randomly)
                        all     (retain all top-scoring sites)

ADD REPLY • link 5.0 years ago by GenoMax 152k

score 2 · Accepted Answer · 2020-07-28

Hi, I am not sure whether this is the option that works best for you. But, I suppose you can get what you want using Tagdust2. Basically you provide the sequences of rRNA genes (I usually generate this using BioMart). I have used this in the past to remove rRNA reads before genome alignment. Run Tagdust2 and then take a look at the TagDust2 output summary file. If you do this following math you will get the number you want.

rRNA reads = total input reads - successfully extracted

When you do this, these rRNA reads will be saved on to a separate .fq file which you can use for any other downstream analyses you want to perform. May be an advantage for you.

To run Tagdust2:

tagdust -t 8 -1 R:N -o /path/to/output/ -ref /path/to/rRNA.txt -fe 2 /path/to/input.fastq

I hope this will help!