Question

How to change a allels to illumina AB?

0

Entering edit mode

6.1 years ago

MichaelTrev ▴ 10

Hi, I got a file with three columns. One is an rs id, others are Allele1 and Allele2. Alleles are presented as nucleotides:

A,G
G,C
G,G
ect.

I need to create a file with AB Illumina format... so I need to convert AG, GC, GG to AB, AA or BB (depends). Can someone explain to me the best way to do that? And it's even possible having only that information what have I?

illumina AB Allels • 3.2k views

ADD COMMENT • link updated 6.1 years ago by Charles Warden 8.3k • written 6.1 years ago by MichaelTrev ▴ 10

1

Entering edit mode

NB May 16, 2019 - although I discuss A and B alleles mostly in relation to major and minor allele in my comment (below), on the Illumina genotyping arrays, A and B relate to TOP and BOT (coding and non-coding strands).

---------------------------------

It is not an easy feat because A and B alleles can mean different things in different contexts. The common interpretation is that A relates to the major allele, whereas B relates to the minor allele. This begs the question: in which cohort are these the major and minor alleles? - the usual reference for this is 1000 Genomes data, but it can also be your study cohort.

Most likely, there will be an annotation file available for the microarray platform that was used, which will [hopefully] contain information on which allele is A or B - ask your colleagues if they know anything about this. If they know nothing, determine the microarray platform that was used and search for the annotation file online.

My other suggestion to you: confirm with your colleagues why AB format is required, and confirm the results that are requested to be obtained. Do the results necessitate a conversion to AB format?

Finally, if all else fails, annotate each of your records for 1000 Genomes Phase III allele frequencies, and then set A and B alleles manually based on the allele frequencies for each (A = major; B = minor). This will take you a bit of extra work; however, it is feasible to do.

Kevin

ADD REPLY • link 6.1 years ago by Kevin Blighe 89k

0

Entering edit mode

Thanks for answers! I learn something about Illumina. AB format is needed coz database work in it. I got additional informations than all alleles in the doc are all TOP alleles. That change anything or I still need to use manifest file and try to deal with R? Best regards!

ADD REPLY • link 6.1 years ago by MichaelTrev ▴ 10

0

Entering edit mode

In that case, your data is just A alleles. While the chip likely originally included A and B alleles, for downstream processing, sometimes we filter out all SNPs from one strand, i.e., those on the non-coding strand, as I mention in step 6, here: Produce PCA bi-plot for 1000 Genomes Phase III - Version 2

ADD REPLY • link 6.1 years ago by Kevin Blighe 89k

score 2 · Answer 1 · 2019-05-15

Hi @hektor102

I agree with Kevin on asking your colleagues why the AB notation is needed since it's less informative than the actual alleles, specially using Illumina definition of A and B.

If you are referring to the A and B alleles as defined by Illumina for their SNP-array technology, then the A and B designation has nothing to do with the population frequency and is defined solely based in the sequence context (this is what produces the symmetry in the BAF plots in SNP-arrays, see the top panel in this image. The AB definition is explained in this technical note.

enter image description here

To transform your alleles into AB alleles I think your best bet would be to get an Illumina manifest file (or a results file) for the exact platform used and use R or similar to match the snps by rs and then transform them to AB using the column in the manifest file. As an example, the manifest of Ilumina OmniExpress is at ftp://webdata2:webdata2@ussd-ftp.illumina.com/downloads/productfiles/humanomniexpress-24/v1-3/infinium-omniexpress-24-v1-3-a1-manifest-file-csv.zip

Hope this helps

Bernat

score 1 · Answer 2 · 2019-05-16

1

Entering edit mode

6.1 years ago

Charles Warden 8.3k

I don't think I saw anybody else say it, but you can define those from GenomeStudio (if you have the .idats, or you can ask to get access to them).

If at all possible, that is what I would recommend. Otherwise, would agree with the feedback that you have received.

Best of luck with your project!

ADD COMMENT • link 6.1 years ago by Charles Warden 8.3k