Finding Unique values on specific INFO field of the VCF file (dbNSFP, vep annotated multisample VCF)
1
0
Entering edit mode
15 months ago
avelarbio46 ▴ 30

Hello everyone! I searched the forum but coundn`t find a question that is like mine

I have a multisample annotated VCF file (with dbNSFP plugin) to which I have filtered using VEP, like so:

/scratch/ensembl-vep-109/ensembl-vep/filter_vep --force_overwrite --input_file {1} --output_file /home/filtering/2/2_{1/.}.vcf --only_matched --filter "(clinvar_clnsig is Pathogenic) or (clinvar_clnsig is Likely_pathogenic)

After filtering, the vcf files are reduced.

Is there anyway of printing the unique values of this INFO field (clinvar_clnsig)?

I tried something like

bcftools query -f "%clinvar_clnsig\n" | sort | uniq > unique.txt

But this not seem to be working, I`m trying to figure out because this is giving me: Error:

no such tag defined in the VCF header: INFO/clinvar_clnsig

But the header contains:

##clinvar_clnsig=(from dbNSFP4.4a_grch38) clinical significance by clinvar Possible values: Benign, Likely_benign, Likely_pathogenic, Pathogenic, drug_response, histocompatibility. A negative score means the score is for the ref allele

And maybe there is a more efficient way of confirming that VEP filtering did work as intended?

bcftools filter vep vcf • 1.1k views
ADD COMMENT
0
Entering edit mode

unique is not a linux command. You want uniq.

ADD REPLY
0
Entering edit mode
  1. Read the manual. Seriously.
  2. query -l lists samples. You want query -f
  3. You have not specific the VCF file in the command, I'm guessing that's because the command is meant to be pseudocode.
  4. You can sort -u instead of sort | uniq.
ADD REPLY
0
Entering edit mode

My mistake. I used query -f but wrote wrongly in the pseudocode above. I will try to use sort -u and also specify the output of my code in the question

ADD REPLY
1
Entering edit mode
15 months ago
avelarbio46 ▴ 30

Because my file was vep annotated, bcftools works a little different and need +slip-vep option

I ended up using:

bcftools +split-vep -f "%clinvar_clnsig\n" input.vcf | tr "," "\n" | sort -u > unique.txt

+split-vep dissociates the info column, -f works just like query -f and tr "," "\n" separates the values from the query to new lines, which are the sorted and only unique values are shown. I`m not sure if this is the most efficient way of veryfing that the filtering works, but it spits out a correct answer.

ADD COMMENT

Login before adding your answer.

Traffic: 1948 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6