How Reads Have Been Mapped In My Bam File?
2
0
Entering edit mode
12.7 years ago
Pascal ★ 1.5k

Working with 1000 Genome Project bam files (NA12878, pilot2), I'd need to know how the reads have been mapped and in particular if there are "unique mapping" for all these reads or if reads have multiple possible mapping.

bam alignment • 3.7k views
ADD COMMENT
1
Entering edit mode
12.7 years ago
toni ★ 2.2k

Are there any @PG lines in your BAM file header?

samtools view -H file.bam | grep "^@PG"

This displays only the @PG lines of the header; if any.

I have downloaded NA12878 once from the Broad and it seems that the the answer is 'bwa 0.5.7' GATK realigner seems to have been used as well.

And if it was indeed mapped with BWA you can check for the 'X0' and 'X1' tag in the alignment (if present, number of best hits ans suboptimal hits respectively)

ADD COMMENT
0
Entering edit mode

No @PG in header. I had verified this point already. Thanks anyway.

ADD REPLY
0
Entering edit mode

Regarding the X0, X1, this is a very good idea, but unfortunately it does not seem to be present (I assume, according to SAM format specifications that "X0" or "X1" might be present): samtools view NA12878.chrom1.SLX.maq.SRP000032.2009_07.bam | grep X0

ADD REPLY
1
Entering edit mode
12.7 years ago
Vikas Bansal ★ 2.4k

I think NA12878 was sequenced using Illumina and mapped by bwa with following commands->

bwa index -a bwtsw $ref (where $ref is the reference fasta file)
bwa aln -q 15 $ref $fq > $sai
bwa samse $ref $sai $fq > $bam

You can find the information in section B Alignment Process, here.

ADD COMMENT

Login before adding your answer.

Traffic: 2497 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6