Question

several gene names for a probeID in affymetrix annotation file

0

Entering edit mode

10.4 years ago

nazaninhoseinkhan ▴ 530

Dear all,

I am trying to map geneIDs from annotation file of S.aureus to probeIDs.

The problem is for over 2000 of rows there are more than 2 geneIDs for corresponding probeID in a row.

Here is the row number of 3544 of annotation file that I put as example:

sa_i10207dr_x_at    1120534 // gi|1120534|ref|NC_002758.2|NC_002758.2(GI:57634611):629461-632324(+) Staphylococcus aureus subsp. aureus Mu50, GENE=sdrC  LOCUS=SAV0561 // ncbi_bacterial // 13 // --- /// 1120535 // gi|1120535|ref|NC_002758.2|NC_002758.2(GI:57634611):632689-636848(+) Staphylococcus aureus subsp. aureus Mu50, GENE=sdrD  LOCUS=SAV0562 // ncbi_bacterial // 116 // --- /// 1120536 // gi|1120536|ref|NC_002758.2|NC_002758.2(GI:57634611):637240-640667(+) Staphylococcus aureus subsp. aureus Mu50, GENE=sdrE  LOCUS=SAV0563 // ncbi_bacterial // 27 // --- /// 1122655 // gi|1122655|ref|NC_002758.2|NC_002758.2(GI:57634611):2782009-2784642(-) Staphylococcus aureus subsp. aureus Mu50, GENE=clfB PRODUCT=Clumping factor B LOCUS=SAV2630 // ncbi_bacterial // 14 // --- /// 1123324 // gi|1123324|ref|NC_002745.2|NC_002745.2(GI:29165615):605214-608077(+) Staphylococcus aureus subsp. aureus N315, GENE=sdrC  LOCUS=SA0519 // ncbi_bacterial // 13 // --- /// 1123325 // gi|1123325|ref|NC_002745.2|NC_002745.2(GI:29165615):608442-612601(+) Staphylococcus aureus subsp. aureus N315, GENE=sdrD  LOCUS=SA0520 // ncbi_bacterial // 116 // --- /// 1123326 // gi|1123326|ref|NC_002745.2|NC_002745.2(GI:29165615):612993-616420(+) Staphylococcus aureus subsp. aureus N315, GENE=sdrE  LOCUS=SA0521 // ncbi_bacterial // 28 // --- /// 1125352 // gi|1125352|ref|NC_002745.2|NC_002745.2(GI:29165615):2718295-2720928(-) Staphylococcus aureus subsp. aureus N315, GENE=clfB PRODUCT=Clumping factor B LOCUS=SA2423 // ncbi_bacterial // 14 // --- /// 3236072 // gi|3236072|ref|NC_002951.2|NC_002951.2(GI:57650036):635788-639935(+) Staphylococcus aureus subsp. aureus COL, GENE=sdrD PRODUCT=sdrD protein LOCUS=SACOL0609 // ncbi_bacterial // 164 // --- /// 3236073 // gi|3236073|ref|NC_002951.2|NC_002951.2(GI:57650036):640327-643829(+) Staphylococcus aureus subsp. aureus COL, GENE=sdrE PRODUCT=sdrE protein LOCUS=SACOL0610 // ncbi_bacterial // 34 // --- /// 3236353 // gi|3236353|ref|NC_002951.2|NC_002951.2(GI:57650036):632578-635423(+) Staphylococcus aureus subsp. aureus COL, GENE=sdrC PRODUCT=sdrC protein LOCUS=SACOL0608 // ncbi_bacterial // 13 // --- /// 3237041 // gi|3237041|ref|NC_002951.2|NC_002951.2(GI:57650036):2711036-2713777(-) Staphylococcus aureus subsp. aureus COL, GENE=clfB PRODUCT=clumping factor B LOCUS=SACOL2652 // ncbi_bacterial // 15 // ---

I want to know is it correct if I consider only the first geneID in each row?

I will appreciate any advice

Nazanin

redundancy gene-names • 2.5k views

ADD COMMENT • link updated 3.3 years ago by Ram 45k • written 10.4 years ago by nazaninhoseinkhan ▴ 530

0

Entering edit mode

@nazaninhoseinkhan did not you ask this question before in another form? corresponding gene names for probeIDs

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.4 years ago by Mo ▴ 920

0

Entering edit mode

No, in that question I wanted to know how to summarizes probeIDs(merge the same probeIDs), while in this one I want to assign gene names to each probeIDs.

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.4 years ago by nazaninhoseinkhan ▴ 530

0

Entering edit mode

Different things indeed. But the custom annotation from brainarray that I mentioned in my answer should solve both.

ADD REPLY • link 10.4 years ago by Chris Evelo 10k

0

Entering edit mode

I checked brainarray but it seems it does not support bacteria

ADD REPLY • link 10.4 years ago by nazaninhoseinkhan ▴ 530

Ram · Answer 1 · 2015-03-14

Yes, that is a common problem with Affymetrix probesets. They can often hit multiple genes that either share sequences covered by specific genes or probesets just turned out not to be as consistent as intended. Affymetrix is aware of that problem and documents it. You can even see a problem directly from the probeset set name (probeset names ending in _x_at are supposed to have the problem you describe). A description for a mouse array here explains that, that explanation is not mouse specific though.

Third party solutions to that problem exist. Typically probes are realigned with latest annotated genomes to get updated probesets which no longer contain fixed numbers of probes. Such custom probe annotations are for instance available from brainarray.

If you use our microarray quality control, normalisation and analysis pipelines at arrayanalysis.org you can choose to use these custom annotations (custom cdf's).