Question

The Reference Genome Of The Reading In Bam File

0

Entering edit mode

11.0 years ago

shl198 ▴ 440

Dear all, I have a question about analyzing BAM files with R. In BAM file, it includes all the reads. Are the reads already mapped to the reference genome? Eg, I want to find how many reads in the BAM are mapped to a reference genome. I used the readBamGappedAlignments to get the reads. And it shows a number of GappedAlignments. Then I used makeTranscriptDbFromUCSC to get a reference and use countOverlaps to count the overlaps, it also gives a series numbers of mapped reads. The number I get using countOverlaps is smaller than using readBamGappedAlignment. Which of these two results should be the number of mapped reads?

r bam alignment • 6.6k views

ADD COMMENT • link updated 11.0 years ago by Ashutosh Pandey 12k • written 11.0 years ago by shl198 ▴ 440

score 1 · Answer 1 · 2013-11-08

1

Entering edit mode

11.0 years ago

Ashutosh Pandey 12k

Are the reads already mapped to the reference genome?

Yes. Most of the reads should already be mapped. Generally aligners or mappers only output mapped reads in bam files. But there are a few aligners that also output the unaligned or unmapped reads but there is enough information in bam file so that you can tell if a read is aligned or not.

I am not a R expert but the solution to your problem is straightforward. Install samtools and try these commands:

1) samtools view -c input.bam will give counts of all the reads in the bam file.

2) samtools view -c -F 4 input.bam will give counts of all the mapped reads to reference genome.

3) samtools view -c -f 4 input.bam will give counts of all the unmapped reads.

ADD COMMENT • link 11.0 years ago by Ashutosh Pandey 12k

0

Entering edit mode

Thanks for your answer. So in general case, if I get a BAM file, is it possible to know the reference genome from which the aligned reads got? I count overlaps of the reads to the Human genome hg19, the number is much smaller than the number of reads in BAM file. Can I say the reference genome of BAM file is not Human genome hg19?

ADD REPLY • link 11.0 years ago by shl198 ▴ 440

1

Entering edit mode

I have never heard a situation like this before. I mean you should have some idea about the reference genome that was used for mapping. An easy would be to use this command:

samtools view -H input.bam

This will output the header. Look carefully and you should be able to see which reference genome was used. You can also count the chromosomes.