I have aligned around 3 varieties against 5 different reference genomes. Now I have several BAM files. What I want to do next is to see if there are different SNPs in different varieties and references (in the same region of course), or if there are some regions where there are no reads or exactly same reads. In other words may be having BAM file results as a table could be a good option. I dont think variant calling (.vcf) is useful in this case.
Does anybody know how to transform a BAM file into a parsable table?
If you want the coordinates for the "same" region across the references, you can perform a genomic multiple alignment. You could use this tool : https://www.biorxiv.org/content/10.1101/730531v1
To be sure of the context what are your 5 references from ? Different species ?
To avoid multi-alignment at the genome level (that would be quite difficult to manage and to use after in your analysis), I would propose the following strategy.
Choose a reference above all, let's call it THE reference
Align your other references vs. THE reference and extract per-position variations... you could use the combination of samtools mpileup and bcftools without any filter... see samtools mpileup VCF output
Finally, you can merge 2 and 3 to compare variants from references and varieties to THE references and between them with a multi-sample vcf analysis.
Of note, you will need an alignment tool that works with "assembly to assembly" alignment; you could use this: https://github.com/lh3/minimap2
The 5 genomes are just different varieties of the same species...
So, just to be sure the methods you proposed are two:
a. Either do multi-alignment
b. or the step 1-4 strategy, right?
Thanks for your useful suggestion.
Do you have the coordinates for the same region across your references?
Nope....this is the problem i need someho to analyse SNPs even if I dont know the coordinates...
These discussions might be helpful: