I previously asked a question about this here: Follow-up question: how to import annotation file into R? and received very helpful tips from Olivier (to use AILUN) and David Westergaard (to use BioConductor). However, I'm still having trouble, and hope someone can help me a bit more with this.
Background: I am currently working on annotating features in our microarray study. I had previously referred to it as a human microarray, but apparently it was Agilent Rat Whole Genome 4131F all along, as I discovered when I uploaded the probe id's to AILUN (thanks for the tip, Olivier). Our group works with xenograft models, and uses both rat arrays and human arrays, so I had just misunderstood which array this particular experiment had been from.
Nevertheless, I'm having quite a bit of trouble getting the 43,018 probes mapped to gene symbols (although, granted, a large number of these probes are control probes like DarkCorner and GE_BrightCorner).
My first attempt was to use Agilent's own annotation files, but these proved to be very difficult for me to import into R. The reason is that some probes have empty entries in some columns (example: A_32_P68142), which confuses my read.table()
command in R (I even tried providing na.strings=""
and comment.char=""
arguments, nothing helped), as I pointed out in my previous post. Olivier also suggested that I try cleaning up the table with UNIX commands, but I am not familiar enough with these to manage this.
My second attempt was to go via BioConductor's annotation package, but they only have a package for 4131A, not 4131F. I hoped that this would be good enough, but it only mapped 24,552 out of 45,018 probes. Examples of probes not mapped: A_44_P330643, A_44_P549509, A_43_P14989
My third attempt was to use to annotation file from AILUN itself. While they say that they have a 99.96% match, they still seem to map only 21,498 out of 45,018 probes. Examples of probes not mapped: A_43_P14989, A_44_P553249, A_44_P260580.
Does anyone have any tips for me?
Thanks for the extensive suggestion. I might eventually make that complaint letter to Agilent like you suggest, but to begin with, I just need to make a bit of progress with this particular study. As such, I first tried your suggested command (I outputted to a differently named file, otherwise identical).
However, when I run the following command in R
I still seem to get the error message
It seems I'm still stuck :(
oh, sorry, I didn't check if that actually resolves things, I was sure it should. I will try and see what this is caused by. In the meantime, could you try runnning the original file through dos2unix and repeat, just in case?
Update 1: I contacted Agilent, and they sent me a newer annotation file that should work better.
Update 2: I ran the following commands:
And in R:
Getting the error message
Update 3: I opened the file annotation_features.txt and removed all columns except for ProbeID and GeneSymbol. This time, the command ran to completion with no error message, which seems to suggest that it might be the free text columns that might be the issue. Although, it seems a number of probes are still not being annotated (examples: A_44_P898256, A_44_P187524, A_43_P14989). It even seems like there is quite a large proportion that are not being annotated:
Should I have any expectation that I should ultimately manage to annotate all probes that are not DarkCorner or GE_BrightCorner?
I think I got the solution for you, please look at the update in the answer.
It worked! Thank you so much :)