Getting Probeset Sequence Information From A Custom Cdf?
5
4
Entering edit mode
14.3 years ago
Sam ▴ 90

Hello, I downloaded a custom CDF file for Affymetrix U133plus2.0 arrays. I am trying to see if I can get the probeset sequence information from this file for a particular Gene. Can anyone help me do this? I looked in the file and see some information about cbase, pbase, and tbase. Is that the place to find the information?

[EDIT: text below moved here from answer]

I am actually using one of those remapped CDFs of the U133plus2.0, so I am most interested in the probe level information...the sequences that are actually making up my new probeset. I suppose this is a tougher task than anticipated

affymetrix • 6.9k views
ADD COMMENT
0
Entering edit mode

Problem with custom CDFs is that the contents vary, because they're...customised. Can you post a link to the custom CDF download location, so we can look at it?

ADD REPLY
7
Entering edit mode
14.3 years ago

The CDF does not contain probe sequences. That information can be downloaded from Affymetrix's web site under Support (free registration required), then select Annotation Files for the platform you want. Sequence information is stored for probes in a FASTA file you can download; the one I think you want is

http://www.affymetrix.com/analysis/downloads/data/HG-U133Plus2.probe_fasta.zip

So long as the probeset IDs (e.g. "1007sat") can be pulled out of your file, you should be able to match them to this Fasta file. The probes have identifiers of the form:

probe:HG-U133A2:1007sat:416:177; InterrogationPosition=3330; Antisense;

ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Thanks, I noticed that immediately after I posted it. The link is correct in the original response.

ADD REPLY
5
Entering edit mode
14.3 years ago
Neilfws 49k

In general, CDFs do not contain sequence information, unless they have been customised to contain a SEQUENCE field. A CDF maps probes to probesets and probesets to (X,Y) coordinates on the chip, hence the name (chip descriptor file). CBASE, PBASE and TBASE refer to the nucleotides at positions 12, 13 and 14 in the probe.

To get probe sequences for the U133 Plus 2.0 file, go to the Affymetrix product page for that array. From there, you can download either a FASTA file or a tabular file. You'll need to create an account and/or login first.

Even if your CDF is customised, there should be matching probeset IDs with the original product file. If you want to get probeset IDs for a particular gene, you can use BioMart, either via the web, or using the Bioconductor biomaRt package. Here is some sample R code, to find the probesets for gene HOXB13:

library(biomaRt)
mart <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
results <- getBM(attributes = c("ensembl_gene_id", "hgnc_symbol", \
           "affy_hg_u133_plus_2"), filters = "hgnc_symbol", \
           values = "HOXB13", mart = mart)
results
ensembl_gene_id hgnc_symbol affy_hg_u133_plus_2
1 ENSG00000159184      HOXB13           230105_at
2 ENSG00000159184      HOXB13           209844_at

From there, you can go back to your FASTA file and pull out the probe sequences for those probesets.

ADD COMMENT
0
Entering edit mode

I am actually using one of those remapped CDFs of the U133plus2.0, so I am most interested in the probe level information...the sequences that are actually making up my new probeset. I suppose this is a tougher task than anticipated.

ADD REPLY
2
Entering edit mode
14.3 years ago
Will 4.6k

I'm not sure that info is actually contained in the CDF file. My understanding has always been that the CDF file only keeps track of which propes are in each probeset. If your custom CDF is in GEO then they often have a link to the sequences. If you got it from some other website then you'll have to root around in there.

ADD COMMENT
2
Entering edit mode
12.8 years ago
Karolis ▴ 20

You can also get probe sequence information for stock CDFs this way:

source("http://www.bioconductor.org/biocLite.R")
biocLite("hgu133plus2probe")
library("hgu133plus2probe")
head(hgu133plus2probe)

For custom CDF you have to download and install probe file. For example: http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/14.1.0/ensg.download/hgu133plus2hsensgprobe_14.1.0.tar.gz This is found in: http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/14.1.0/ensg.asp

Then install the probe file:

R CMD INSTALL hgu133plus2hsensgprobe_14.1.0.tar.gz

Then run these commands in R:

library(hgu133plus2hsensgprobe)
head(hgu133plus2hsensgprobe)
ADD COMMENT

Login before adding your answer.

Traffic: 2579 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6