Question

create a track with mismatch density from a SAM/bam file

0

Entering edit mode

7.2 years ago

Stephane Plaisance ▴ 460

I want to identify regions where a larger than average number of apparent varioants suggest an assembly error or a read alignment error.

Any idea how I could parse a SAM/BAM and count differences from the reference for each read which I then save in BED format?

I guess the solution in in the cigar but if someone has solved this before and could share some code, I would be very grateful.

Best S

relates to Getting number of mismatches from bam record where no solution was posted so far

sam bam mismatch • 2.4k views

ADD COMMENT • link 7.2 years ago by Stephane Plaisance ▴ 460

0

Entering edit mode

how I could parse a SAM/BAM and count differences .... relates to "Getting number of mismatches from bam record " where no solution was posted so far

what about Zev's solution: the NM tag ?

ADD REPLY • link 7.2 years ago by Pierre Lindenbaum 166k

score 0 · Answer 1 · 2018-02-16

Brilliant; I finally learned what NM was and it is exactly what I needed. Thanks Pierre

Hoping that the read is always right to the start position, I made of it the next command. Does this make sense?

samtools view -f 0x2 mappings.bam   | gawk -v rlen=${read_len} 'BEGIN{FS="\t"; OFS="\t"}{($16~/NM/)?split($16,nm,":"):split($17,nm,":"); print $3,$4,$4+rlen,"NM",nm[3]}' > NM_counts.bed

The NM seems to move around in my BAM between 16 and 17th columns, hence some edits To be then merged / grouped-by and summarised

Best Stephane

score 0 · Answer 2 · 2018-02-16

NM is not always in the $12 , the END position of a read on the REF is not START+read_len (indels, clipping......)

here is a solution using bioalcidaejdk: http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

 java -jar dist/bioalcidaejdk.jar -e 'stream().filter(R->!R.getReadUnmappedFlag() && R.hasAttribute("NM")).forEach(R->println(R.getContig()+"\t"+(R.getStart()-1)+"\t"+R.getEnd()+"\t"+R.getIntegerAttribute("NM")));' in.bam

rotavirus   5   75  7
rotavirus   5   75  3
rotavirus   5   75  2
rotavirus   6   74  6
rotavirus   6   76  6
rotavirus   6   71  5
rotavirus   6   76  7
rotavirus   6   76  6
rotavirus   6   76  6
rotavirus   6   76  7
rotavirus   6   76  5

score 0 · Answer 3 · 2018-02-16

0

Entering edit mode

7.2 years ago

Stephane Plaisance ▴ 460

meanwhile rediscovered bedtools bamtobed which seems to do precisely what I want (-ed) when I can compile the java above, I will compare. S

ADD COMMENT • link 7.2 years ago by Stephane Plaisance ▴ 460