Hi Team,
As I'm a newbie my question might be very lame, so please bear with me ...
I was told that if we find a multi-allelic entry in our VCF file than it means that it is not normalized, and thus left-aligned ? ( multiple values under ALT column , in our case T,A and A,T)
I found below entry (modifying/removing some values to make data anonymous ) in the VCF files I downloaded from UKBB.
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE_ID
chrMM 5060XXXX chrMM_5060XXXX_G_T;chrMM_5060XXXX_G_A G T,A 61 PASS AF=0.04752,2e-06;AQ=61,38;AC=0,0;AN=2 GT:DP:AD:GQ:PL:RNC 0/0:14:14,0,0:42:0,...
chrMM 5060YYYY chrMM_5060YYYY_G_A;chrMM_5060YYYY_G_T G A,T 49 PASS AF=5e-06,2e-06;AQ=49,38;AC=0,0;AN=2 GT:DP:AD:GQ:PL:RNC 0/0:16:16,0,0:48:0,...
Does this mean the VCF file is not normalized ?
I went into this rabbit hole as VEP tool didn't return any annotation and "internet" told me one possible reason could be that files are not normalized/left aligned ...
May someone please confirm some approach via which I can check if VCF files are normalized or not ?
Thanks again team ...
Mullti-allelics do not equal non-normalized VCF, although splitting multi-allelics and representing variants in left-aligned parsimonious co-ordinates do go together frequently.
Checking if a VCF is normalized is not an operation worth doing, you'd be better off just running a normalization tool. I'd recommend
vt
. I'd definitely recommend at least splitting multi-allelic entries before annotation. With VEP, check online using an entry that you think should be annotated and compare the command line shown there with the command you're running to debug the lack of annotation.While
bcftools norm
gives the illusion that left aligning variants and splitting multi-allelic entries is part of the same process, it is not. Most SNVs won't be affected by the former process, for example.I recommend
vt
because it retains a record of the changes it makes in-file, serving as a log. It addsINFO/OLD_MULTIALLELIC
andINFO/OLD_VARIANT
entries so you can filter down to variants that were changed by your operations. See https://genome.sph.umich.edu/wiki/Vt#Decompose and https://genome.sph.umich.edu/wiki/Vt#NormalizationThanks @_r_ram for your enlightening response, much appreciated ...
tried installing the vt tool but seeing this error : https://github.com/atks/vt/issues/113
can't use Conda in my enviornment .. :(
But then what is life without some struggle ? :D
Please don't add answers unless you're answering the top level question. Use Add Comment or Add Reply instead. Now that that's out of the way, what kind of machine are you working on - a local machine such as a laptop/desktop or a HPC cluster? If it's the latter, contact your sysadmin.
Why can you not use the conda workaround? It does not need super user privileges.
Thanks again @ _r_am I see your point about adding reply or comment !!! will be more cautious next time ... its a cluster and sysadmin doesn't allow usage of Conda ...
Are you sure of that? Maybe you are confusing conda with Docker, this latter requiring admin privileges to install and it is sometimes frowned upon by sys admins. I don't mean to encourage a sneaky behavior but have you tried installing conda? If so, what errors did you encounter?
Thanks dariober for the comment ... I didn't even attempt to run conda as our admin says ... "We request our users to please not install Anaconda on the clusters" I started the normalization step via bcftools ... it takes care of the decomposition scenario, will surely try out the vt tool when issue is resolved ... it seems like a good weapon to have in one's arsenal
bcftools also does left alignment. If your sysadmin won't allow conda, ask them to install vt. They should be open to doing that.
Plus, does your sysadmin have a problem with conda or Anaconda? Anaconda is a bulky package, you can work with miniconda, which is a much slimmer tool. Plus, if your sysadmin is being this pain because they are trying to control what binaries gets run on the cluster, that's just them being unreasonable. Ask them why they don't want conda, and whether their problem lies with conda or anaconda.
I guess its more about "control" than any other reason ... I will ask if vt can be installed, and also if mini-condo can be installed...
but I'm very surprised with the issue with vt ... I checked the current version of code on GitHub and they are calling make for a file that doesn't exists in the codebase ... but probably I'm missing something as I won't expect author of such an awesome tool to commit without building ...or may be he/she used Conda to test the build ... regardless ... I thank you both for taking out time and to respond ... I'm already in love with Biostar community :)
Have a great one team !!!
Again about conda and your sysadmins... Recently I've become quite a fan of conda and while it has its problems and critics I'm in no way looking back at when I was installing stuff in various
/bin/
directories - a total mess when working on several projects across various servers and years!If I were you, I would investigate further with IT to see if they have a valid reason to refuse conda (not be confused with anaconda!) and see if you can resolve it. If they do have a valid reason, I'd like to know what that is. One of the nice things of conda is that, in contrast to Docker, it's all self contained within the user space so you shouldn't even be able to annoy other users and if something goes wrong just delete the conda environment and start again.
Everyone is human, and people can make mistakes. But sure, the community would have caught on. What are you referring to when you say there is a make target that doesn't exist? I have a feeling that maybe you're misreading the make file.
lib/utils.c file doesn't exists in the lib folder ... https://github.com/atks/vt/tree/master/lib/libdeflate/lib
which we refer in the Makefile of libdeflate LIB_SRC := lib/deflate_decompress.c lib/utils.c \ $(wildcard lib/*/cpu_features.c)
and we get the following error while building
It could be an untested change, I think. I can't see a different explanation, but if conda is not an option, try switching to an earlier commit and building it. Maybe this one's a stable working commit: https://github.com/atks/vt/tree/88da43649b5a39ddfc00d8a8f4d494fad50d5eec
See this SO answer on how to switch to a custom commit: https://stackoverflow.com/a/7832839/1394178