Question

Find reference base for a position in a given read from a BAM file

0

Entering edit mode

9.8 years ago

Andrew ▴ 60

If I take a random read from a BAM file, and want to know what the reference base would be at position 1 of that read, is there a way to do this? I was thinking I could do something with the POS column of the BAM file and then add to it the position of the base in the read I am looking at to get the position on the reference sequence.

Maybe a different way to say this is does there exist a way to get the nth base from a specific chromosome in a reference sequence (Fasta).

BAM • 6.2k views

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 9.8 years ago by Andrew ▴ 60

Ram · Answer 1 · 2015-07-09

4

Entering edit mode

9.8 years ago

Matt Shirley 10k

There is another way to get the reference base from just the BAM file, by using the MD tag: https://github.com/vsbuffalo/devnotes/wiki/The-MD-Tag-in-BAM-Files. I've implemented this as the .parse_md method in the simplesam Sam class.

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 9.8 years ago by Matt Shirley 10k

0

Entering edit mode

Thanks for the explanation of the MD flag and the detailed examples Matt :)

It blows my mind that the MD tag doesn't include a way to encode insertions... but I can't say it surprises me :P

Your parser, if his data contains the MD flag, is the way to go I think!

ADD REPLY • link updated 2.4 years ago by Ram 45k • written 9.8 years ago by John 13k

Ram · Answer 2 · 2015-07-09

1

Entering edit mode

9.8 years ago

Andrew ▴ 60

Found the solution: http://seqanswers.com/forums/showthread.php?t=17315

In case anyone finds this in the future just do the following

samtools faidx <file.fa> chr1:500-500

This will get you the 500th base in the given fasta file.

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 9.8 years ago by Andrew ▴ 60

2

Entering edit mode

Take care that if the read aligns with indels then the n-th position in the read will not correspond to the the POS+n position in the reference.

ADD REPLY • link updated 2.4 years ago by Ram 45k • written 9.8 years ago by dariober 15k