Would a string composed of `#CHROM`, `POS`, `REF` and `ALT` uniquely identify a variant?
1
2
Entering edit mode
9 months ago
gernophil ▴ 90

Hey everyone,

It's a simple question, but the answer might be tricky :) (or not). In my current workflow, I made an unique identifier for vaiants from a VCF composed of these substrings: #CHROM + "_" + POS + "_" + ALT.

For my data this is unique for every variant. But this might not always be the case, would it? There could be a deletion of the G in a GT leading to a T as ALT, but there could also be a simple mutation of that G to a T leading to the same combination of #CHROM, POS and ALT. Or would this change the POS?

So I wanted to extend this by REF to #CHROM + "_" + POS + "_" + REF + "_" + ALT.

So two questions:

  1. Is my assumption correct that #CHROM + "_" + POS + "_" + ALT might not always be unique?
  2. Would #CHROM + "_" + POS + "_" + REF + "_" + ALT lead to a unique string in every possible case (according to the VCF definitions)?
VCF • 649 views
ADD COMMENT
0
Entering edit mode

if both the build and the nomenclature system are promulgated

ADD REPLY
1
Entering edit mode
9 months ago

well it should be ok in most cases but

  • the name of the chromosome might change 1 vs chr1
  • you should add the build grch37:1:1234:A:T
  • there's symbolic alleles : chr1:77:A:<DELETION>
  • multiallelic variants: chr1:777:A:T,C
  • normalization : chr1:10:A:ATT vs chr1:10:AA:AAT
ADD COMMENT
0
Entering edit mode

Thanks for the reply :).

  • the name of the chromosome might change 1 vs chr1

True, but this will not happen within one analysis, where you mostly stick to the same genome/annotation

  • you should add the build grch37:1:1234:A:T

The argument from the above point also counts for here I think. For cross genome uniqueness that would be a good idea.

  • there's symbolic alleles : chr1:77:A:<DELETION>

Does this happen often. Isn't a deletion normally shown by fewer ALT then REF bases?

  • multiallelic variants: chr1:777:A:T,C

Also true, but I almost always split them.

  • normalization : chr1:10:A:ATT vs chr1:10:AA:AAT

This point I don't get. Could you elaborate this?

ADD REPLY

Login before adding your answer.

Traffic: 2659 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6