Dear All,
I want to know % of reads matching to rRNA genes. So I have been suggested to download rRNA genes from ensembl. Then I have created the index files for rRNA fasta file. Finally using bowtie2, I have aligned my reads in fastq format against these rRNA index files. Below is the bowtie2 output I received after running the following command.
Frank$ bowtie2 -N 0 -L 15 -x rRNA_genes -1 Project/Sample/DH558-1_GTGGCC_L005_R1.all.fastq.gz -2 Project/Sample/DH558-1_GTGGCC_L005_R2.all.fastq.gz -S Project/DH558-1.sam
66113117 reads; of these:
66113117 (100.00%) were paired; of these:
66061268 (99.92%) aligned concordantly 0 times
58 (0.00%) aligned concordantly exactly 1 time
51791 (0.08%) aligned concordantly >1 times
----
66061268 pairs aligned concordantly 0 times; of these:
13 (0.00%) aligned discordantly 1 time
----
66061255 pairs aligned 0 times concordantly or discordantly; of these:
132122510 mates make up the pairs; of these:
132068318 (99.96%) aligned 0 times
13629 (0.01%) aligned exactly 1 time
40563 (0.03%) aligned >1 times
0.12% overall alignment rate
What does it mean - aligned concordantly 0,1, >1 times?
0.12% overall alignment rate
- Does this 0.12% refer to percentage of reads mapping to rRNA genes?
Thanks Devon for confirming it. I believe, the same approach can be applied to know percentage of reads associated to any X gene. Instead of rRNA gene fasta sequence, I need to take fasta sequence of my gene of interest, first to build index and then align to reads.
It would be better to align against the whole genome and then count alignments to your gene. By restricting what you align to you end up increasing the false-positive alignment rate (though the increase may be insignificant in many cases).
Thanks Devon, yeah that need to be considered.