Hi,
I find out that my vcf file have only name of reference as hg38 and it was done for gene panel study.
So I used annotated vcr file with snpEff and this reference sequence. However I noticed quite a difference between my data sequences and general databases like dbSNP, NCBI gene and Uniprot unablig me get meaningful information from the variations outside the annotated vcf file.
So I would like you to ask about some programs/procedures which would able me get important information from genetic data and not just make guesses?
I found out that it would be useful to get reanotation of that file with LiftOver from UCSC browser. However I am not sure if it's ok to use it for mainly SNP data and is it possible to use without known reference sequence and have at least meaningful information close enough for reanotated data?
I am quite desperate, because I finally realised that vcf data from 12.2.2021 cannot be considered valid in 2024. I feel stuck and without ideas. I would be thankful for any help.
Can you explain why? GRCh38 was originally released in June 2014. Since that time there are only patch releases that do not change the chromosome coordinates.
There are multiple changes released even with patches in my data - mainly changes in interpretation, but there were a few move on by a few bp. By using LiftOver it seemed more scientific robust method and stable. Because I technically don't know the reference, I am not sure what to use and the cross reference can be significantly different between each other. But if using cross reference is regular way of doing it, I am going to do it using .p14 and going to thank you.
Your original post seemed to indicate that you only had the VCF but if you do have original fastq or even the aligned BAM file (instead of fastq) then realigning the data to a reference/annotation package would be the safest option as noted below.
AFAIK
LiftOver
only allows moves between major genome releases (not patches).