Hey all,
I have been trying to liftover a particular VCF file from GRCm38 to NCBIm37. I have used UCSC LiftOver tool, Ensembl API, CrossMap and Picard. None of them are lifting over completely. Either they are not working at all or having rejected variants. Especially in Picard LiftoverVcf, the rejected variants are those with have NoTarget in them. No idea why. The reference fasta file I am using is Mus_musculus.NCBIM37.61.dna.toplevel.fa. and the liftover chain file is GRCm38_to_NCBIM37.chain.gz
The vcf file is from:
ftp://ftp-mouse.sanger.ac.uk/current_snps/strain_specific_vcfs/129S1_SvImJ.mgp.v5.snps.dbSNP142.vcf
Any leads will be helpul. Thanks in advance
Best,
Susmita
what is 'NoTarget' ?
I have no idea. Its something like this: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 129S1_SvImJ 1 3000023 . C A 109 NoTarget CSQ=A||||intergenic_variant||||||||;DP=6;DP4=0,0,6,0 GT:GQ:DP:MQ0F:GP:PL:AN:MQ:DV:DP4:SP:SGB:PV4:FI 1/1:22:6:0.166667:152,22,0:137,18,0:2:36:6:0,0,6,0:0:-0.616816:.:1
it's the FILTER column , and should be defined in the VCF header...
I just want the VCF to be lifted over!
searching online would help (key words: picard liftover notarget in google): https://github.com/broadinstitute/picard/blob/master/src/main/java/picard/vcf/LiftoverVcf.java
NoTarget is not the main issue. Issue is why theVCF is not getting lifted completely. Is there any tool that can do?
VCF from the link posted in OP is huge and gzipped vcf is ~200 mb (on http://crispor.tefor.net/genomes/mm10/orig/). It would help if you could post example records that are not lifted between the builds with headers. In general, there are always discrepancies between builds (vcf). some of the record get merged and some of the records get dropped. However this % is small, in consecutive builds.
I dont think I can copy that many lines here.