hi,
I have a tab-delimited table of of protein ids that looks like that:
45 FBpp0070037
46 FBpp0070039;FBpp0070040
47 FBpp0070041;FBpp0070042;FBpp0070043
48 FBpp0070044;FBpp0110571
...
For each of these protein Ids I would like to extract the gene id (Fbgn....) in a third column. the output table should looks like that:
45 FBpp0070037 FBgn001234
46 FBpp0070039;FBpp0070040 FBgn00094432;FBgn002345
47 FBpp0070041;FBpp0070042;FBpp0070043 FBgn0001936;FBgn000102;FBgn004527
48 FBpp0070044;FBpp0110571 FBgn0097234;FBgn00183
...
I was thinking using biomaRt, but I could find a way of automating it for the complete protein ids in the line
I would appreciate your Ideas.
Thanks A.
the file is good, but not exactly what I was looking for. thanks.
can you be more specific
my problem is not to get the data from biomaRt, but to get it and keep the structure of the table. If I'll run the column as one ID per line, I will have it than difficult to bring the IDs back to their right protein ID.
see the edited answer: to include your request