I mapped single-end sequenced GBS reads onto two similar genomes. I have two sam files one for each genome. I'm interested to filter sam/bam files based on these conditions:
1. reads unique to a genome/reads mapped only on one genome
2. reads are not unique/mapped on both genomes.
Are there any existing implementations for this? I want to know before I go ahead and implement this in pysam.
Were the reads mapped to each genome separately? I don't think there is a tool that does what you want, but BBSplit from BBTools comes close - you would have to do the mapping again
If you end up coding your own solution, I think it would be wise to sort the sam / bam by name, so parsing both files simultaneously is simpler and uses less memory.
This is going to be over-stringent. Some reads will map perfectly to one genome, and map with discrepancies to the other. You probably want to count those for the first genome, not throw them away becasue the aligner forced them to align to the second genome.
Better to combine the genomes, align to that, and then you can fish out reads that aligned uniquely.
Compare BAM files ?
Were the reads mapped to each genome separately? I don't think there is a tool that does what you want, but BBSplit from BBTools comes close - you would have to do the mapping again
If you end up coding your own solution, I think it would be wise to sort the sam / bam by name, so parsing both files simultaneously is simpler and uses less memory.
Thanks for the suggestion! Yes, they were mapped separately. I will look into BBTools.