Extract base information from aligned reads in a sam file
1
0
Entering edit mode
4.3 years ago
yliueagle ▴ 290

I generated a sam (bam) file by aligning fastq to reference using BWA. Now I would like to retrieve the the basepair at a given chromosome position, for example, chr12:11010100:11010100, for all reads aligned to this region, how can I do that? Thanks! (My expected workflow would be: (1) extract reads from sam that cover the querying position; (2) get the basepair from these reads at this position)

sam basepair reads • 1.7k views
ADD COMMENT
0
Entering edit mode

Look into using igvtools which is part of IGV. I assume you are looking to get counts of basepairs (alleles) at a particular location.

ADD REPLY
0
Entering edit mode

Not only counts, but the basepair at that position for each read, with read name included in the output. (Of course the counts can be generated from this table as a downstream step)

ADD REPLY
1
Entering edit mode

Due to the fact that NGS data is very 'messy' (désordonnée), every base position will have multiple bases. You may want to output the frequencies of these via: A: Calculate The Frequency Of Nucleotides At Each Position In An Mpileup File

ADD REPLY
0
Entering edit mode

Is this good enough? samtools view <your bam> chr:start-end > reads.sam

ADD REPLY
0
Entering edit mode

That will get the reads but not specific base at that position. Not sure what is the best way to do that.

ADD REPLY
0
Entering edit mode

Thank you. This do obtain the reads but I also need to retrieve the basepair information from reads.sam at chr:start-end

ADD REPLY
2
Entering edit mode
4.2 years ago
yliueagle ▴ 290

I found http://lindenb.github.io/jvarkit/Sam2Tsv.html can do the job. Thanks all!

ADD COMMENT

Login before adding your answer.

Traffic: 1908 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6