Question

hg18 vs hg19 in SNP6.0 data

0

Entering edit mode

8.7 years ago

cbst ▴ 160

I have SNP6.0 data that were generated in 2010-2011. I use ASCAT to process them, and on the ASCAT website it is recommended to use PennCNV to create the LogR and BAF-files.

In ASCAT's protocol, they suggest using hg19 as build in the PennCNV procedure. I was wondering if anyone knows how using hg18 versus hg19 affects the coordinates of the SNP6.0 probes?

hg18 hg19 SNP6.0 ASCAT PennCNV • 3.2k views

ADD COMMENT • link 8.5 years ago by cbst ▴ 160

score 1 · Answer 1 · 2016-05-23

I contacted Affymetrix directly, and they recommend to always use the latest version of the annotation file for the appropriate array. So for my case, that is the annotation file for GenomeWideSNP_6 Annotations, CSV format, Release 35 (313 MB, 4/30/15) (current latest release). The Annotation file contains the SNP positions. So Liftover is not required here.

Here is their exact reply:

Please know that the reason why you see different versions of the annotation file is because we design our arrays using information from public databases, however these databases are constantly updated and annotation updates incorporate current releases from GenBank, RefSeq, Ensembl, UniGene, Entrez Gene, UniProt and UCSC, as well as sequences from other organism-specific databases;

Some of our arrays were designed several years ago at which time the information on the databases was different and most of our arrays were last updated in April 2015 so some previously entered genes have not been validated over the years therefore taken out of the database but our probes on the arrays are still designed to target those.

As more genes are discovered, and others are removed if they fail validation by the scientific community, the number of gene assignments in the annotation files will fluctuate. The "well-annotated" gene counts are for gene symbols that are associated with transcripts having the best experimental support, such as the reviewed and validated records of RefSeq or those with accessions beginning with "NM_";

So you should always consider the Current NetAffx Annotation File only, to analyse your data, regardless of how old or when your array was processed, since it will give you the most up to date annotation information availabl

Please know that regardless of the annotation variation, all the probes are always in the same array coordinates, this is determined by the library file, which is unique for each array;

If we add or remove probes we will change the array name, has it happended several times, but for each array, regardless of the annotation version, all the probes are in the same place;

This is valid for our hole array portfolio, so SNP6 also.

score 0 · Answer 2 · 2016-04-05

0

Entering edit mode

8.7 years ago

WouterDeCoster 47k

It's a different genome build with different coordinates. It will affect your results drastically and make everything wrong.

You could use UCSC liftover: https://genome.ucsc.edu/cgi-bin/hgLiftOver to convert between builds.

ADD COMMENT • link 8.7 years ago by WouterDeCoster 47k

0

Entering edit mode

Thank you for your answer. I am not sure if I can use liftover with Genome-wide Human SNP6.0 array data, I don't have sequencing data. Do you have experience with using liftover on SNP6.0 data?

ADD REPLY • link 8.7 years ago by cbst ▴ 160

1

Entering edit mode

Liftover is for converting coordinates between one build and another. So you convert the position you have from your data in hg18 to hg19, or vice versa. Or hg38 :)

ADD REPLY • link 8.7 years ago by WouterDeCoster 47k