how to identify disease variants from bam file ?
3
1
Entering edit mode
6.0 years ago
masaver ▴ 20

Hi there BioStars!,

Given a bam file (from a set of analyzed ATACseq data), and a list of disease variants (in VCF format.), how can I determine if any "disease variant" is present on my bam file?

Many thanks in advance,

Best,
-Mathias

disease-variants VCF bam ATAC-seq • 2.0k views
ADD COMMENT
3
Entering edit mode
6.0 years ago

1. Create a target file

$ bcftools query -f'%CHROM\t%POS\t%REF,%ALT\n' file.vcf | bgzip -c > als.tsv.gz && tabix -s1 -b2 -e2 als.tsv.gz

2. Do variant calling forcing bcftools to get genotypes for variants given in the target file.

$ bcftools mpileup -R als.tsv.gz -f reference.fasta input.bam | bcftools call -C alleles -T als.tsv.gz -m -A - > output.vcf

3. Filter variant sites (optional)

$ bcftools view -e 'GT="ref"' -o variant_sites.vcf output.vcf

fin swimmer

ADD COMMENT
0
Entering edit mode
6.0 years ago

You'll first have to do variant calling on that bam file. You can limit the variant calling to the locus of interest, if you are using GATK you can use the -L parameter for this.

ADD COMMENT
0
Entering edit mode
6.0 years ago
Garan ▴ 690

Use a variant caller such as GATK HaplotypeCaller, Platypus, SAMtools, FreeBayes, etc. This will give you a VCF (usually) with the Single Nucleotide Variants (SNVs) and InDels (Short insertation / deletion events). If you also want larger variants such as Copy Number Variants (CNVs) and Structural Variants (SVs) you'll have to look at a different set of calling algorithms, these are much harder to call if you only have a single sample to work from (XHMM, ExomeDepth, etc). You may well not get a VCF output from these (although VCF does kind of support large variant such as these, the second step annotation may throw an error).

Annotate your VCF (Alamut, SNPeff, VEP) - this will probably give you a large text file with one or more rows per variant (depending on transcripts). Filter the annotation file based on whatever criteria you have for pathogenic variants (population frequency, variant effect, previous evidence etc). Family history and additional sequencing can be invaluable.

Since you already have a list of disease variants it's just cross checking the VCF from the first step against the given disease VCF (as finswimmer and WouterDeCoster have given examples of).

ADD COMMENT

Login before adding your answer.

Traffic: 1825 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6