Entering edit mode
3.3 years ago
Vasiliy Krestov
▴
30
I have a .sam file with reads mapped to a plasmid and a bacterial chromosome. I used bowtie to map reads and I allowed multi-mapping.
How can I exclude reads which were aligned to both plasmid and chromosome from the .sam file? I want to do this in order to avoid biases caused by different plasmid copy numbers
This would likely need a custom program. You will need to name sort your BAM file and then walk through it to find reads that aligned to both chromosome and plasmid (I assume it was a separate entry in the reference) and drop those reads/lines.
A more crude way would be to isolate columns 1 and 3 and the
sort|uniq
them. Then keep read entries that occur only once.For bowtie it is pretty simple as you can tune alignment parameters to not allow multimappers in the SAM file, see http://bowtie-bio.sourceforge.net/manual.shtml#bowtie-options-m
Alternatively, multimappers have low MAPQ scores, so filtering based on MAPQ (say 10 and above) should remove multimappers as well, e.g.
samtools view -q 10 -o out.sam in.sam
I think I can't use it because at the same time I want to retain those multi-mappers which occur only in chromosome/plasmid. I don't want to exclude all multi-mappers :)