Hello Biostars,
I'm working on a phylogenetics software package, and part of what I need to do is extract a 1:1 mapping information for reads mapped against a reference genome. Based on the info contained within a SAM/BAM file, I know that the data that I need is wrapped up in there somewhere, likely in the CIGAR string but basically what I need is:
1) Map FASTA (not FASTQ in this case) to reference genome (easy, done) 2) Given a reference location, return the name and position of the mapped fragment
So if a 500bp fragment (Fragment X) maps to Chr10, and I query Chr10:8,202,291, I would like to return the corresponding position in Fragment X (ie. Chr10:8,202,291 corresponds to Fragment X:209). In the event that the position falls within an indel, I am fine to just purge that site for now.
Any ideas? I'm very capable of testing out different things, but I'm struggling to find the exact tool for the job.
Best and thanks.
EDIT: Actually to be more precise, I would like to give the position number of the fragment and get the reference position:
Example:
Fragment X:209 maps to Chr10:8,202,291
EDIT2: I'm looking into the MD tag now, but any advice would be appreciated.
this is not clear to me : what's the difference with a simple
samtools view ' Chr10:8,202,291' | cut -f 1
?There is no need to remove a thread after someone tries to help you solving the problem.
depending on how literal you mean by "extract", you should be able to load the sam/bam in IGV and see some of the reads & exactly where they map to, it shows the reference alignment at the bottom and if you mouse over the reads a tool-tip should give you more information