Any tools for extracting aligned part of genome from sam or bam file
1
1
Entering edit mode
9.5 years ago
mheyda ▴ 20

Does anyone know any tool which would be able to parse sam or bam file in order to give the aligned part of genome per read and finally returns a new fastq file similar to the input but including variant and removing errors in read?

bam sam bwa alignment • 2.0k views
ADD COMMENT
0
Entering edit mode

Why exactly do you want to do this? If you want to do assembly, then why do you only care about aligned regions? Similarly, why would you do variant calling before-hand?

ADD REPLY
0
Entering edit mode

I don't want to do assembly at this moment, I want to see errors in reads. actually I am working on error correction method for reads, so I want to have the correct read in advance to see how can I correct them.

ADD REPLY
0
Entering edit mode

Ah, your tags were somewhat poorly chosen then. I suspect you'll need to write a tool to do this. I suspect you'll end up needing to rewrite part of the htslib mpileup functionality, simply to allow handling alignments from the stack prior to removing them.

ADD REPLY
1
Entering edit mode
9.5 years ago
thackl ★ 3.0k

I can think of a dirty way to achieve what you want. It uses my Sam::Parser from proovread (pacbio correction software). The code below is a draft. It prints a FASTA record corresponding to the aligned part of each read. It cannot handle reverse-complement mappings and probably has some other flaws. But it's an idea. Let me know, if you want to give it a try and I could help you to further improve the snippet.

git clone https://github.com/BioInf-Wuerzburg/proovread.git
export PERL5LIB=proovread/lib/:$PERL5LIB;

BAM=genome.bam
REF=genome.fa

samtools faidx $REF;
samtools view $BAM |
perl -MSam::Parser -e '
  my $sp=Sam::Parser->new();
  while(my $aln=$sp->next_aln){ 
    next if $aln->is_unmapped();
    print $aln->rname,":",$aln->pos,"-",(length $aln->seq_aligned),"\n"
}' |
xargs samtools faidx $REF > ref-reads.fa
ADD COMMENT
1
Entering edit mode

Thanks a lot, I have started writing similar code in python and I should also consider reverse complement thanks by the way

ADD REPLY
1
Entering edit mode

Sure, I'm a perl guy, as you can see :). Most of it, you could just do on command line. The only ugly thing is, that you need to parse the cigar string in order to get the reads real length and to account for clipped stuff. I'd be interested to see your solution.

ADD REPLY

Login before adding your answer.

Traffic: 1727 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6