How to get percentage of reads mapped to each Reference Genome
1
0
Entering edit mode
7.9 years ago

I am very much stopped by the bowtie command. I have bowie output as unmapped and mapped reads. Since, I have paired reads, I used ./bowtie2 -x bt2_base -q -1 Read1.fasta -q -2 Reads2.fasta --un-conc unmapped mapped. I have three reference genome as R1,R2,R3 for which I created the index file and then used the command. The output have the actual reads as mapped and unmapped. I also want to know to which number the Read1 mapped to R1,R2 and R3 respectively. I am very much stuck in this step. Can any one help me with this please?

Regards, Bandana

RNA-Seq sequencing alignment • 4.2k views
ADD COMMENT
1
Entering edit mode
7.9 years ago
GenoMax 147k

You could run qualimap on your BAM file to get detailed stats which (I think) should include the % of reads aligned to each reference.

BBMap can help when you are aligning to multiple references. Add option (in addition to other normal bbmap.sh options scafstats=file.stat to get the fraction of reads aligned to each reference.

ADD COMMENT
0
Entering edit mode

As per the bowtie manual, with bowtie option --un I am getting the fastq output with fastq reads which are the unmatched reads to the reference genome. I am not getting the BAM file.

ADD REPLY
0
Entering edit mode

Well, the unmapped reads didn't map to any reference, so that part's easy. It's the mapped reads you need to worry about, which should come out as a sam or bam file...

ADD REPLY
0
Entering edit mode

I am very confused by this time. I used the bowtie option as to get the unaligned reads which in the manual indicates that with --un I will get the all reads that could not be aligned to a file with name <filename>. And with --al ll reads for which at least one alignment was reported to a file with name <filename>. Since I am using three reference genome at a time I am not able to get the individual percentage of the eads aligning to Reference1 , Reference 2, Reference 3.

Please can you help me with this,.

ADD REPLY
0
Entering edit mode

Don't think you get an unaligned reads bam file with --un option. You can only specify if you want the resulting fastq to be compressed (--un -gz, -bz2, -lz4) or uncompressed.

ADD REPLY
0
Entering edit mode

I think you need to specify -S before the output file name if you want sam output.

ADD REPLY
0
Entering edit mode

I am very confused by this time. I used the bowtie option as to get the unaligned reads which in the manual indicates that with --un I will get the all reads that could not be aligned to a file with name <filename>. And with --al ll reads for which at least one alignment was reported to a file with name <filename>. Since I am using three reference genome at a time I am not able to get the individual percentage of the eads aligning to Reference1 , Reference 2, Reference 3.

Please can you help me with this,.

ADD REPLY
0
Entering edit mode

For paired end reads, -un-conc will give you all the reads that didn't align concordantly, in the original fastq format, al-conc will give you all the reads that aligned concordantly, in the original fastq format, and -S output.sam will give you an output file in sam format containing all the reads, aligned and unaligned. For the aligned reads, the sam file will show which reference sequence the read aligned to, and you can get more stats about how many reads aligned to each reference with samtools idxstats.

Neither of the parameters -un-conc or -al-conc is required, if you don't want to get those reads in fastq format.

ADD REPLY

Login before adding your answer.

Traffic: 1571 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6