Dear all, I have a question about analyzing BAM files with R. In BAM file, it includes all the reads. Are the reads already mapped to the reference genome? Eg, I want to find how many reads in the BAM are mapped to a reference genome. I used the readBamGappedAlignments to get the reads. And it shows a number of GappedAlignments. Then I used makeTranscriptDbFromUCSC to get a reference and use countOverlaps to count the overlaps, it also gives a series numbers of mapped reads. The number I get using countOverlaps is smaller than using readBamGappedAlignment. Which of these two results should be the number of mapped reads?
Thanks for your answer. So in general case, if I get a BAM file, is it possible to know the reference genome from which the aligned reads got? I count overlaps of the reads to the Human genome hg19, the number is much smaller than the number of reads in BAM file. Can I say the reference genome of BAM file is not Human genome hg19?
I have never heard a situation like this before. I mean you should have some idea about the reference genome that was used for mapping. An easy would be to use this command:
samtools view -H input.bam
This will output the header. Look carefully and you should be able to see which reference genome was used. You can also count the chromosomes.
OK. Now I figure out how to do that. Thank you very much.