Entering edit mode
3.5 years ago
L_to_the_m
▴
10
Hi,
I have multiple BAM files per sample and I would like to call SNPs from all of these files to analyze all samples together afterwards. Could it improve the SNP calling, when merging the BAM files together? Or does it make a difference for the VCFs produced afterwards?
Hope you can help me.
Best regards!
pooling of BAM files was performed for 1000 Genomes project since the coverage of the whole-genome sequencing was low.
it allowed the detection of frequent SNPs with good accuracy.
nowadays, having a coverage of >30x on average, it does not make much sense to pool BAMs - even rare SNVs can be quite confidenlty detected (more complex with indels + complex regions, but this is another topic).
so the answer is - it depends on your task, do you want to find common SNPs in low coverage samples or you care about rare variants in OK covered ones.
Thanks for your reply. I want to analyze them later on all together in one PLINK file. So I want SNPs for all samples with high hard call rate. So I am not sure, if I understood it correctly what is the best way to do this?
you can perform multi-sample calling using GATK - https://gatk.broadinstitute.org/hc/en-us/articles/360035890431-The-logic-of-joint-calling-for-germline-short-variants - but I doubt you need to physically merge reads from different samples (thus merging their "BAMs")
Ok thanks, so the link you shared indeed talks about the advantages of joint genotype calling. That makes much more sense than just merging BAMs. Thanks a lot!