Entering edit mode
5.0 years ago
laura.bertola
•
0
Hi all,
I have bam files for 10 individuals mapped onto a reference genome. I'd like to extract SNPs and their flanking regions (50 bp upstream and 50 bp downstream) based on a coverage threshold. F.e. minimum coverage of 20 for all 10 bam files combined. I'd also prefer to have not more than 1 polymorhpic site in the flanking regions.
Any suggestions on how to approach this? I assume the first step would be to merge the bam files?
Thanks!
One way to do this would be to use
samtools mpileup
supplying a list of SNP regions as an option, then filter the output based on your minimum coverage requirement. Related: A: Mpileup With A Region FileThanks for the suggestion! If I understand correctly, I'd use samtools mpileup to generate a list of SNPs first, then convert this to a list of regions and repeat the mpileup step? This makes sense, although I'd thought there may be a more straight forward way of achieving this.
Most SNP callers like GATK, VarScan, BCFtools, will provide a coverage for each SNP that you can filter on. After filtering for SNPs that satisfy your threshold, you can use
bedtools slop
to expand them by 50 bases and supply these regions tosamtools mpileup
.Thanks! I've been looking into this and I think it is working!