How to overlap patient VCF with ClinVar database annotation using bedtools?
1
2
Entering edit mode
12 months ago
Muffin Man ▴ 10

Hello,

I'm trying to help a colleague who is trying to add ClinVar databases clinical significance column to VCF samples that she analysed. More specifically, we are trying to add overlapping/common variant annotation so that if the variant exist in the patient and ClinVar, we want the annotation to be carried into the new VCF/BED file output. We tried using Google Bard's and ChatGPT solutions which mainly rely on bedtools intersect but we had problems. I think it is possible, shouldn't be that complicated but we aren't bash terminal experts. Could anyone think of the code that would work for the siutation above using the bash terminal? It should just be a simple case of overlap to carryover the corresponding annotation from the ClinVar VCF to the patient VCF. The VCF format we are using is VCF 4.2.

Here's the link for ClinVar HG19 - wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar.vcf.gz

Thank you

-C.J

bedtools bash ClinVar overlap VCF • 1.6k views
ADD COMMENT
1
Entering edit mode

Consider looking into "annotating" a VCF or varient effect predictions. doing a bedtools intersect of your patient VCF with a Clinvar VCF isn't a bad approach necessarily, but there are purpose built tools for this that will likely give better results. Example thread Is there a way to annotate existing VCF file with known disease-causing mutations?

Example tools: VEP, Annovar, SNPEff, etc.

ADD REPLY
0
Entering edit mode

In my experience, ChatGPT will give you a better starting point with a more vague initial idea (compared to Google search) but is in no way a place for inexperienced people to start doing serious sensitive work. Please do not use ChatGPT unless you can verify everything it says/recommends. Use it to refresh your memory or give you minor places to start, but DO NOT rely on it.

For example, with my experience, I was able to frame the right question for ChatGPT: "How can I annotate a VCF using data from the ClinVar VCF?" And here is what it says:

To annotate a VCF file using data from the ClinVar VCF, you can use tools like bcftools and vcfanno. Here, I'll provide a basic example using bcftools and vcfanno. Note that you need to have these tools installed on your system.

1. Download the ClinVar VCF file:
You can download the ClinVar VCF file from the ClinVar FTP site. For example, you can use the following command to download the VCF file:

   wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz

2. Download the VCF file you want to annotate:

Download the VCF file you want to annotate, or use your own VCF file.

3. Install bcftools and vcfanno:

You can install these tools using package managers like apt-get (for Debian/Ubuntu) or brew (for macOS). For example:

   # For Debian/Ubuntu
   sudo apt-get install bcftools

   # For macOS
   brew install bcftools

You can install vcfanno from the GitHub repository:

   go get -u github.com/brentp/vcfanno

4. Create an annotation configuration file:

Create a configuration file (e.g., clinvar_annotate.conf) specifying the annotations you want to add from ClinVar. Here's a basic example:

   [clinvar_annotate]
   file = "clinvar.vcf.gz"
   fields = "CLNSIG,CLNDBN,CLNACC"
   ops = "first,first,first"
   names = "ClinVar_Significance,ClinVar_Disease_Name,ClinVar_Accession"

5. Run vcfanno:

Use the configuration file to annotate your VCF file. For example:

   vcfanno -p 8 clinvar_annotate.conf your_input.vcf > your_output_annotated.vcf

Adjust the file names and paths accordingly.

Now, your_output_annotated.vcf will contain additional annotations from ClinVar.

Please note that this is a basic example, and you might want to customize the configuration file based on your specific requirements. Additionally, ensure that you have the necessary permissions to access and use the ClinVar data.

And when I asked it if it could show me an example using bcftools, it gave me this pretty good code (pretty close to Pierre's excellent solution that covers all bases). You may want to limit which INFO attributes are carried over in the -c parameter to bcftools annotate and use --pair-logic like he does:

Certainly! bcftools provides a built-in way to annotate VCF files using external databases. In this case, we can use the ClinVar VCF file to annotate another VCF file. Here's a step-by-step example:

1. Download ClinVar VCF:

Download the ClinVar VCF file (GRCh38 version) from the ClinVar FTP site:

   wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz

2. Download the VCF file to be annotated:

Download the VCF file you want to annotate. For example, if you have your own VCF file named your_input.vcf, use that file.

3. Index the ClinVar VCF:

   bcftools index clinvar.vcf.gz

4. Annotate your VCF using ClinVar:

   bcftools annotate -a clinvar.vcf.gz -c INFO your_input.vcf -o your_output_annotated.vcf

This command uses bcftools annotate to add information from ClinVar to your VCF file. The -a option specifies the annotation file (ClinVar VCF), and -c INFO specifies that the annotations should be added to the INFO field of your VCF. Adjust the file names and paths accordingly.

Now, your_output_annotated.vcf will contain additional annotations from ClinVar.

Keep in mind that this is a basic example, and you may want to explore additional options provided by bcftools for more advanced annotation settings.

ADD REPLY
2
Entering edit mode
12 months ago
bcftools annotate -a 'clinvar.vcf.gz' -c 'ID,CLNSIG,GENEINFO' --pair-logic  some input.vcf.gz 

both vcfs should have the same chrom notation ('1' vs 'chr1')

ADD COMMENT

Login before adding your answer.

Traffic: 1571 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6