Entering edit mode
6.4 years ago
Sudhir Jadhao
▴
70
Dear All,
I am converting LRG coding sequence co-ordinate to hg19 genomic.
In "NCBI Genome Remapping Service- clinical remap " gives two types of output (given in this image).
Input:
NM_020469.2:c.1046_1048delAGG
Output1. Download full mapping report
CHROM START END
NC_000009.11 136131070 136131072 (basepairs=>CCT)
Output2.Download Annotation Data(vcf-format)
CHROM POS ID REF ALT
NC_000009.11 136131068 . TGAG T
My question is why the co-ordinate and base pairs are different for the same input.
Hi Sudhir,
I have adjusted the image for you using formatting options. You need to elaborate more on the question and you need to add appropriate tags. Tags help you to get quick attention to fellow members.
Thanks Vijay
Thank you, Vijay, for suggestions.
I have updated the question
Hi There,
The content in the attached image is unreadable. Could you either update the image or paste the raw data here?
I have added raw data
It looks like the NCBI record for this accession has been updated to NC_000009.12, could you check if the discrepancies still exist in the updated version of this record.
but both outputs are form same assembly version: NC_000009.11
Hello Sudhir,
some comments and questions on this:
All these comments and questions bring us back to the first point: What is your goal? Maybe we can find a better way.
fin swimmer
Thank you fin for your replay!!
I have population genomics data in VCF file (hg19). I want extract blood group variant from the vcf files
I hope it helps.
So you have a XY-Problem?!
What have this task to do with "LRG-Transcripts", "Remapping", variants given in hgvs notation, ...?
I would suggestion you go back to the start and explain:
fin swimmer
I have experimentally validated blood group SNP from ISBT database. These blood group variants are only present in cDNA RefSeq transcript form. To compare these variants with my hg19 VCF, I want to convert them to hg19.
Ensembl's VEP can do this.
Input:
You can export the result as VCF:
fin swimmer
EDIT:
Contacted Emily_Ensembl to have a looked at this. I think the result should be
I take a closer look on it. And it's ... difficult.
In the Community Annotations on the LRG Site you'll find this comment:
And yes, the existing transcript's have a lot of changes compared to the reference genome. Not just SNV also Ins/Del. There is at least one position where the transcripts contain one more base in the coding region. Therefor the whole
c.
description changes.Maybe I've found a little work around:
We use Ensembl's VEP for this again. But this time we use the version for hg38 and instead of NM_020469 we use ENST00000611156.4. One have to notice that these two transcript are not 100% identicaly. NM_020469 has one triplet more between c.255 and c.256. So for every variant that is behind this position we have two subtract 3 base.
Finaly for the given example our input would look like this:
Run it and export the output as vcf:
With this result one can go to NCBI Remapping Service, take the
Assembly-Assembly
Tab, chooseGRCh38.p11 :: Primary Assembly
as Source andGRCh37.p13 :: Primary Assembly
as target. Copy&Paste thevcf
obtained from VEP.The exported Annotation Data file now looks like this:
That seems to be fine. If this all work's for your other variants? I don't know. Give it a try.
fin swimmer
@Sudhir: You should contact NCBI support with this example if this is happening.
Thank you @fin and @Emily_Ensembl,
I have tried Ensembl's VEP before also but it is not converting all my input RefSeq to hg19, for only a few inputs it's working.
In will contact NCBI support, will see their reply