Question

How Reads Have Been Mapped In My Bam File?

0

Entering edit mode

13.1 years ago

Pascal ★ 1.5k

Working with 1000 Genome Project bam files (NA12878, pilot2), I'd need to know how the reads have been mapped and in particular if there are "unique mapping" for all these reads or if reads have multiple possible mapping.

bam alignment • 3.9k views

ADD COMMENT • link updated 13.1 years ago by Vikas Bansal ★ 2.4k • written 13.1 years ago by Pascal ★ 1.5k

score 1 · Answer 1 · 2012-03-15

1

Entering edit mode

13.1 years ago

toni ★ 2.2k

Are there any @PG lines in your BAM file header?

samtools view -H file.bam | grep "^@PG"

This displays only the @PG lines of the header; if any.

I have downloaded NA12878 once from the Broad and it seems that the the answer is 'bwa 0.5.7' GATK realigner seems to have been used as well.

And if it was indeed mapped with BWA you can check for the 'X0' and 'X1' tag in the alignment (if present, number of best hits ans suboptimal hits respectively)

ADD COMMENT • link 13.1 years ago by toni ★ 2.2k

0

Entering edit mode

No @PG in header. I had verified this point already. Thanks anyway.

ADD REPLY • link 13.1 years ago by Pascal ★ 1.5k

0

Entering edit mode

Regarding the X0, X1, this is a very good idea, but unfortunately it does not seem to be present (I assume, according to SAM format specifications that "X0" or "X1" might be present): samtools view NA12878.chrom1.SLX.maq.SRP000032.2009_07.bam | grep X0

ADD REPLY • link 13.1 years ago by Pascal ★ 1.5k

score 1 · Answer 2 · 2012-03-15

1

Entering edit mode

13.1 years ago

Vikas Bansal ★ 2.4k

I think NA12878 was sequenced using Illumina and mapped by bwa with following commands->

bwa index -a bwtsw $ref (where $ref is the reference fasta file)
bwa aln -q 15 $ref $fq > $sai
bwa samse $ref $sai $fq > $bam

You can find the information in section B Alignment Process, here.

ADD COMMENT • link 13.1 years ago by Vikas Bansal ★ 2.4k