Converting string to numerical in bcftools
1
0
Entering edit mode
16 months ago
pjuge • 0

Hi everyone,

I am using bcftools to filter variants from a VCF file. The variants from this VCF file have been annotated using ANNOVAR.

I would like to filter variants having a CADD score > 20 in a field named "CADD_phred" which has been created and annotated by ANNOVAR.

However, when the following command does not work:

bcftools view -i'CADD_phred>20' -Oz -o out.vcf.gz in.vcf.gz

I believe it is because the CADD_phred information has been stored as a string and does not contain numerical value. For example, I was able to remove the variants without CADD scores (stored as ".") using the bcftools view -e'CADD_phred="."' command.

Is there any solution to convert string values to numerical values using bcftools?

Thanks!

bcftools • 1.8k views
ADD COMMENT
0
Entering edit mode

Hi,

I was wondering if you were able to fix that, I have the exact same issue. This is my case, it is a string, I changed to float but wasnt able to filter by CADD>20.

##INFO=<ID=CADD_phred,Number=.,Type=String,Description="CADD_phred annotation provided by ANNOVAR">

Any thoughts?

Thanks

ADD REPLY
0
Entering edit mode
16 months ago
cfos4698 ★ 1.1k

I'm not sure of a simple single command to convert string to integer. However, you could do it through a few steps.

  1. Print the full header from the VCF to a new file using bcftools view -h > header.txt
  2. Modify the header either in a text editor or using something like sed to change the info for CADD_phred to 'Type=Integer'
  3. Reheader the VCF file using bcftools reheader -h header.txt -o new_header.vcf

I assume that will work, but hard to say without seeing your actual VCF file.

ADD COMMENT
0
Entering edit mode

Thank you for your reply! I was able to change the header line

##INFO=<ID=CADD_phred,Number=.,Type=String,Description="CADD_phred annotation provided by ANNOVAR">

to

##INFO=<ID=CADD_phred,Number=.,Type=Float,Description="CADD_phred annotation provided by ANNOVAR">

without any effect on the filtering by CADD_phred>20

As I noticed that other information containing numbers were indicated as "Number=1", I also tried to change it to

##INFO=<ID=CADD_phred,Number=1,Type=Float,Description="CADD_phred annotation provided by ANNOVAR">

without effect...

ADD REPLY
0
Entering edit mode

Make sure that "CADD_phred" is actually a float and not an integer. Also, when filtering with bcftools view, try explicitly indicating that "CADD_phred" is defined as an "INFO" tag: bcftools view -i "INFO/CADD_phred>20" -Oz -o out.vcf.gz in.vcf.gz. If that doesn't work, provide an example truncated VCF file showing the header and at least one variant with a CADD_phred score.

ADD REPLY
0
Entering edit mode

Following the above example, How can I make sure "CADD_phred" is float and not integer?

Thanks

ADD REPLY
0
Entering edit mode

As I said in the previous reply, I/we probably can't help without an example truncated VCF file showing the header and at least one variant with a CADD_phred score. Without this I/we are simply guessing at solutions.

Is the number associated with CADD_phred in the form of 123.45? If so, it's a float, and the header might need Type=Float. Is it in the form of 123? If so, it's an integer, and the header might need Type=Integer. These suggestions might even be completely down the wrong track, but, again, no way to know without basic example data.

ADD REPLY
0
Entering edit mode

Hey. I just edited the CADD_score of the vcf file in a text editor. It worked. Thanks for commenting.

ADD REPLY

Login before adding your answer.

Traffic: 2789 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6