Hello everyone! I searched the forum but coundn`t find a question that is like mine
I have a multisample annotated VCF file (with dbNSFP plugin) to which I have filtered using VEP, like so:
/scratch/ensembl-vep-109/ensembl-vep/filter_vep --force_overwrite --input_file {1} --output_file /home/filtering/2/2_{1/.}.vcf --only_matched --filter "(clinvar_clnsig is Pathogenic) or (clinvar_clnsig is Likely_pathogenic)
After filtering, the vcf files are reduced.
Is there anyway of printing the unique values of this INFO field (clinvar_clnsig)?
I tried something like
bcftools query -f "%clinvar_clnsig\n" | sort | uniq > unique.txt
But this not seem to be working, I`m trying to figure out because this is giving me: Error:
no such tag defined in the VCF header: INFO/clinvar_clnsig
But the header contains:
##clinvar_clnsig=(from dbNSFP4.4a_grch38) clinical significance by clinvar Possible values: Benign, Likely_benign, Likely_pathogenic, Pathogenic, drug_response, histocompatibility. A negative score means the score is for the ref allele
And maybe there is a more efficient way of confirming that VEP filtering did work as intended?
unique
is not a linux command. You wantuniq
.query -l
lists samples. You wantquery -f
sort -u
instead ofsort | uniq
.My mistake. I used query -f but wrote wrongly in the pseudocode above. I will try to use sort -u and also specify the output of my code in the question