Does anyone know any tool which would be able to parse sam or bam file in order to give the aligned part of genome per read and finally returns a new fastq file similar to the input but including variant and removing errors in read?
Does anyone know any tool which would be able to parse sam or bam file in order to give the aligned part of genome per read and finally returns a new fastq file similar to the input but including variant and removing errors in read?
I can think of a dirty way to achieve what you want. It uses my Sam::Parser
from proovread (pacbio correction software). The code below is a draft. It prints a FASTA record corresponding to the aligned part of each read. It cannot handle reverse-complement mappings and probably has some other flaws. But it's an idea. Let me know, if you want to give it a try and I could help you to further improve the snippet.
git clone https://github.com/BioInf-Wuerzburg/proovread.git
export PERL5LIB=proovread/lib/:$PERL5LIB;
BAM=genome.bam
REF=genome.fa
samtools faidx $REF;
samtools view $BAM |
perl -MSam::Parser -e '
my $sp=Sam::Parser->new();
while(my $aln=$sp->next_aln){
next if $aln->is_unmapped();
print $aln->rname,":",$aln->pos,"-",(length $aln->seq_aligned),"\n"
}' |
xargs samtools faidx $REF > ref-reads.fa
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Why exactly do you want to do this? If you want to do assembly, then why do you only care about aligned regions? Similarly, why would you do variant calling before-hand?
I don't want to do assembly at this moment, I want to see errors in reads. actually I am working on error correction method for reads, so I want to have the correct read in advance to see how can I correct them.
Ah, your tags were somewhat poorly chosen then. I suspect you'll need to write a tool to do this. I suspect you'll end up needing to rewrite part of the htslib mpileup functionality, simply to allow handling alignments from the stack prior to removing them.