Would a string composed of `#CHROM`, `POS`, `REF` and `ALT` uniquely identify a variant?
1
2
Entering edit mode
10 months ago
gernophil ▴ 120

Hey everyone,

It's a simple question, but the answer might be tricky :) (or not). In my current workflow, I made an unique identifier for vaiants from a VCF composed of these substrings: #CHROM + "_" + POS + "_" + ALT.

For my data this is unique for every variant. But this might not always be the case, would it? There could be a deletion of the G in a GT leading to a T as ALT, but there could also be a simple mutation of that G to a T leading to the same combination of #CHROM, POS and ALT. Or would this change the POS?

So I wanted to extend this by REF to #CHROM + "_" + POS + "_" + REF + "_" + ALT.

So two questions:

  1. Is my assumption correct that #CHROM + "_" + POS + "_" + ALT might not always be unique?
  2. Would #CHROM + "_" + POS + "_" + REF + "_" + ALT lead to a unique string in every possible case (according to the VCF definitions)?
VCF • 707 views
ADD COMMENT
0
Entering edit mode

if both the build and the nomenclature system are promulgated

ADD REPLY
1
Entering edit mode
10 months ago

well it should be ok in most cases but

  • the name of the chromosome might change 1 vs chr1
  • you should add the build grch37:1:1234:A:T
  • there's symbolic alleles : chr1:77:A:<DELETION>
  • multiallelic variants: chr1:777:A:T,C
  • normalization : chr1:10:A:ATT vs chr1:10:AA:AAT
ADD COMMENT
0
Entering edit mode

Thanks for the reply :).

  • the name of the chromosome might change 1 vs chr1

True, but this will not happen within one analysis, where you mostly stick to the same genome/annotation

  • you should add the build grch37:1:1234:A:T

The argument from the above point also counts for here I think. For cross genome uniqueness that would be a good idea.

  • there's symbolic alleles : chr1:77:A:<DELETION>

Does this happen often. Isn't a deletion normally shown by fewer ALT then REF bases?

  • multiallelic variants: chr1:777:A:T,C

Also true, but I almost always split them.

  • normalization : chr1:10:A:ATT vs chr1:10:AA:AAT

This point I don't get. Could you elaborate this?

ADD REPLY

Login before adding your answer.

Traffic: 4129 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6