Extract flanking regions from bam files based on coverage threshold
0
0
Entering edit mode
5.0 years ago

Hi all,

I have bam files for 10 individuals mapped onto a reference genome. I'd like to extract SNPs and their flanking regions (50 bp upstream and 50 bp downstream) based on a coverage threshold. F.e. minimum coverage of 20 for all 10 bam files combined. I'd also prefer to have not more than 1 polymorhpic site in the flanking regions.

Any suggestions on how to approach this? I assume the first step would be to merge the bam files?

Thanks!

snp flanking region coverage bam mapping • 1.4k views
ADD COMMENT
0
Entering edit mode

One way to do this would be to use samtools mpileup supplying a list of SNP regions as an option, then filter the output based on your minimum coverage requirement. Related: A: Mpileup With A Region File

ADD REPLY
0
Entering edit mode

Thanks for the suggestion! If I understand correctly, I'd use samtools mpileup to generate a list of SNPs first, then convert this to a list of regions and repeat the mpileup step? This makes sense, although I'd thought there may be a more straight forward way of achieving this.

ADD REPLY
0
Entering edit mode

Most SNP callers like GATK, VarScan, BCFtools, will provide a coverage for each SNP that you can filter on. After filtering for SNPs that satisfy your threshold, you can use bedtools slop to expand them by 50 bases and supply these regions to samtools mpileup.

ADD REPLY
0
Entering edit mode

Thanks! I've been looking into this and I think it is working!

ADD REPLY

Login before adding your answer.

Traffic: 1509 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6