Question

How to extract sequence message from SAM?

0

Entering edit mode

8.0 years ago

Jeason Rad ▴ 30

Hi, all. I have a PE-alignment SAM file, how can I extract the pos/start/end/strand/sequence from each flagment? I used this command-line to extract pos/start/end/strand,

bedtools bamtobed -i reads.bam -bedpe | awk -v OFS="\t" '{if($9=="+"){print $1,$2,$6}else if($9=="-"){print $1,$2,$6}}' > fragments.bed

but how to put the sequence message into it?

SAM BED sequencing • 1.5k views

ADD COMMENT • link updated 6.5 years ago by Biostar 20 • written 8.0 years ago by Jeason Rad ▴ 30

0

Entering edit mode

cut -f 10 file.sam

can be used to extract sequence, but not suit for pair-end sequence.

chr16   85790994        85791072        +
chr8    17447157        17447198        -
chr1    39340268        39340412        +
chr2    227662155       227662242       +
chr12   93416078        93416128        +

This is the bed file from bamtobed. How can I extract the sequence between col2 & col3 from my BAM file? Any suggestions will be grateful！

ADD REPLY • link 8.0 years ago by Jeason Rad ▴ 30

score 0 · Answer 1 · 2017-05-06

pipe your output of bamtobed into a loop with samtools faidx:

 echo -e "ref\t6\t22\t+\nref\t8\t18\t+" |  | while IFS=$'\t' read -a B; do echo -ne "${B[0]}\t${B[1]}\t${B[2]}\t${B[3]}\t" &&  samtools faidx reference.fa "${B[0]}:${B[1]}-${B[2]}" | grep -v ">" | tr -d "\n" && echo ; done

note: for minus strand, I haven't rev-complemented the sequence. Furthermore , I'm lazy , as the input is BED, there will be +1 shift for the first base of the DNA.