Should return 0 but returns 7243

Question

match gives me all NA in annotating my genes

0

Entering edit mode

6.8 years ago

zizigolu ★ 4.3k

Hi, I am doing GO in R

I downloaded this annotation

https://www.affymetrix.com/analysis/downloads/na33/wtgene-32_2/HuGene-1_0-st-v1.na33.2.hg19.probeset.csv.zip

    annot = read.csv(file = "HuGene-1_0-st.csv", header = T);
    dim(annot)
    probes = names(datExpr)
> head(probes)
[1] "MKL2"    "MAST2"   "KAT5"    "WWC2"    "UBE2Z"   "PHYHIPL"

    probes2annot = match(probes, annot$transcript_cluster_id)

Gives me all NA

sumis.na(probes2annot))

Should return 0 but returns 7243

What I am doing wrong?

> head(annot)
  probeset_id seqname strand  start   stop probe_count
1     7896739    chr1      +  63033  63649          31
2     7896741    chr1      +  69109  70008          24
3     7896743    chr1      + 334144 334272           6
4     7896745    chr1      + 367693 368597          36
5     7896747    chr1      + 564951 565019          28
6     7896751    chr1      + 568069 568136          28
  transcript_cluster_id  exon_id   psr_id
1               7896738 96595544 97686467
2               7896740 96595546 97686470
3               7896742 96595548 97686473
4               7896744 96595550 97686476
5               7896746 96595552 97686479
6               7896750 96595556 97686485
                                                                                                                                                        gene_assignment
1                                                                                                                                            ENST00000492842 // OR4G11P
2                                                                 BC136848 // OR4F17 /// NM_001005240 // OR4F17 /// NM_001004195 // OR4F4 /// ENST00000318050 // OR4F17
3                                                                                                                                                                   ---
4 NM_001005277 // OR4F16 /// NM_001005221 // OR4F29 /// NM_001005504 // OR4F21 /// ENST00000456475 // OR4F29 /// ENST00000456475 // OR4F16 /// ENST00000456475 // OR4F3
5                                                                                                                                                                   ---
6                                                                                                                                                                   ---
                                                                                                                                                                                    mrna_assignment
1                                                                                                                                                   ENST00000492842 // chr1 // 100 // 31 // 31 // 0
2    BC136848 // chr1 // 100 // 24 // 24 // 0 /// NM_001005240 // chr1 // 100 // 24 // 24 // 0 /// NM_001004195 // chr1 // 100 // 24 // 24 // 0 /// ENST00000318050 // chr1 // 100 // 24 // 24 // 0
3               ENST00000455207 // chr1 // 100 // 6 // 6 // 0 /// TCONS_l2_00002387-XLOC_l2_000726 // chr1 // 100 // 6 // 6 // 0 /// TCONS_l2_00002388-XLOC_l2_000726 // chr1 // 100 // 6 // 6 // 0
4 NM_001005277 // chr1 // 100 // 36 // 36 // 0 /// NM_001005221 // chr1 // 100 // 36 // 36 // 0 /// NM_001005504 // chr1 // 89 // 32 // 36 // 0 /// ENST00000456475 // chr1 // 100 // 36 // 36 // 0
5                                                                                                                                                           AK074482 // chr1 // 79 // 22 // 28 // 0
6                                                                                                                                                         NC_001807 // chr1 // 100 // 24 // 24 // 0
  crosshyb_type number_independent_probes number_cross_hyb_probes
1             3                         0                       0
2             3                         0                       0
3             3                         0                       0
4             3                         0                       0
5             3                         0                       0
6             3                         0                       0
  number_nonoverlapping_probes level bounded noBoundedEvidence
1                            4   ---       0                 0
2                            7   ---       0                 0
3                            0   ---       0                 0
4                            6   ---       0                 0
5                            0   ---       0                 0
6                            0   ---       0                 0
  has_cds fl mrna est vegaGene vegaPseudoGene ensGene sgpGene
1       0  0    0   0        0              0       1       0
2       0  1    0   0        0              0       1       0
3       0  0    0   0        0              0       1       0
4       0  3    0   0        0              0       1       0
5       0  0    0   0        0              0       1       0
6       0  0    0   0        0              0       1       0
  exoniphy twinscan geneid genscan genscanSubopt mouse_fl
1        0        0      0       0             0        0
2        0        0      0       0             0        0
3        0        0      0       0             0        0
4        0        0      0       0             0        0
5        0        0      0       0             0        0
6        0        0      0       0             0        0
  mouse_mrna rat_fl rat_mrna microRNAregistry rnaGene mitomap
1          0      0        0                0       0       0
2          0      0        0                0       0       0
3          0      0        0                0       0       0
4          0      0        0                0       0       0
5          0      0        0                0       0       0
6          0      0        0                0       0       0
  probeset_type
1          main
2          main
3          main
4          main
5          main
6          main
>

R gene annotation • 1.6k views

ADD COMMENT • link updated 6.8 years ago by Satyajeet Khare ★ 1.6k • written 6.8 years ago by zizigolu ★ 4.3k

score 1 · Answer 1 · 2018-03-16

1

Entering edit mode

6.8 years ago

michael.ante ★ 3.9k

Hi,

in your annot table, the column transcript_cluster_id consists of numerical values. There should not be any match. In this case the match function return the value, given by the parameter 'nomatch'.

I guess, you can try match on the gene assignment. As fara as I remember, there are also a lot Affymetrix specific annotation provided in R (see here).

ADD COMMENT • link 6.8 years ago by michael.ante ★ 3.9k

0

Entering edit mode

Thank you my data is on
GPL16791 Illumina HiSeq 2500 (Homo sapiens)

I also tried gene assignment by your suggestion that gives NA

ADD REPLY • link 6.8 years ago by zizigolu ★ 4.3k

score 1 · Answer 2 · 2018-03-16

1

Entering edit mode

6.8 years ago

Satyajeet Khare ★ 1.6k

For Affy ST arrays, you can use oligo read.celfiles function like this...

rawData <- read.celfiles(celFiles)

You can try normalization

Data <- rma(rawData)

And finally try annotation on normalized data

Data <- annotateEset(Data, hugene10sttranscriptcluster.db)

You may have to change the annotation database. Not very sure about that.

ADD COMMENT • link 6.8 years ago by Satyajeet Khare ★ 1.6k

0

Entering edit mode

Thank you, my data is Illumina HiSeq 2500

ADD REPLY • link 6.8 years ago by zizigolu ★ 4.3k