Hi there BioStars!,
Given a bam file (from a set of analyzed ATACseq data), and a list of disease variants (in VCF format.), how can I determine if any "disease variant" is present on my bam file?
Many thanks in advance,
Best,
-Mathias
Hi there BioStars!,
Given a bam file (from a set of analyzed ATACseq data), and a list of disease variants (in VCF format.), how can I determine if any "disease variant" is present on my bam file?
Many thanks in advance,
Best,
-Mathias
1. Create a target file
$ bcftools query -f'%CHROM\t%POS\t%REF,%ALT\n' file.vcf | bgzip -c > als.tsv.gz && tabix -s1 -b2 -e2 als.tsv.gz
2. Do variant calling forcing bcftools
to get genotypes for variants given in the target file.
$ bcftools mpileup -R als.tsv.gz -f reference.fasta input.bam | bcftools call -C alleles -T als.tsv.gz -m -A - > output.vcf
3. Filter variant sites (optional)
$ bcftools view -e 'GT="ref"' -o variant_sites.vcf output.vcf
fin swimmer
You'll first have to do variant calling on that bam file. You can limit the variant calling to the locus of interest, if you are using GATK you can use the -L parameter for this.
Use a variant caller such as GATK HaplotypeCaller, Platypus, SAMtools, FreeBayes, etc. This will give you a VCF (usually) with the Single Nucleotide Variants (SNVs) and InDels (Short insertation / deletion events). If you also want larger variants such as Copy Number Variants (CNVs) and Structural Variants (SVs) you'll have to look at a different set of calling algorithms, these are much harder to call if you only have a single sample to work from (XHMM, ExomeDepth, etc). You may well not get a VCF output from these (although VCF does kind of support large variant such as these, the second step annotation may throw an error).
Annotate your VCF (Alamut, SNPeff, VEP) - this will probably give you a large text file with one or more rows per variant (depending on transcripts). Filter the annotation file based on whatever criteria you have for pathogenic variants (population frequency, variant effect, previous evidence etc). Family history and additional sequencing can be invaluable.
Since you already have a list of disease variants it's just cross checking the VCF from the first step against the given disease VCF (as finswimmer and WouterDeCoster have given examples of).
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.