Retrieving Taxonomy from Uniprot/Swissprot ACC_ID From Blastx Results
1
0
Entering edit mode
7.4 years ago
ladypurrsia ▴ 60

I have completed a blastx run on my samples and have obtained the following result (example):

$head blastx_result.txt

NS500162:172:HG5CJBGXX:1:11101:2522 ZWIP2_ARATH 52.500 40 19 0 2 121 25 64 8.26e-07 44.3

I would like to take the ACC_ID number, in this case, ZWIP2_ARATH and find the taxonomic information for this. After doing a search, I found this site: UniProtKB/Swiss-Prot entries.

Here is what this .txt file (from link) looks like:

ENTRY NAME  AC     nb AA   Description - Biological Source 
ZWIP2_ARATH Q9SVY1   383   Zinc finger protein WIP2 (Protein  TRANSMITTING TRACT) (WIP-                                                     
                           domain protein 2) (AtWIP2) [Gene: WIP2 or NTT or At3g57670 
                           or F15B8.140] - Arabidopsis thaliana (Mouse-ear cress)
023R_IIV3   Q197D7   106   Uncharacterized protein 023R [Gene: IIV3-023R] -                                                                   
                           Invertebrate iridescent virus 3 (IIV-3) (Mosquito virus)

This text file contains all of the ACC_ID's and links them to the respective function and taxonomy. The taxonomy comes after the final '-' delimiter (there can be more than one). However, a simple grep command (grep -e 001R_FRG3G shortdes.txt) will not work because of the way this file is set up. One ACC_ID can take 1, 2, or 3 total lines, depending on the ACC_ID.

So, I thought about removing new lines:

awk '{ printf "%s", $0 }'

but this makes a mess out of the file - as it keeps all the tabs and major spacing's, but it's all one line and that's not practical.

I also must add that I have > 500,000 of these ACC_IDs to look up and map to Taxonomy!

There must be a simple solution to just extracting the taxonomy from this file or by any other means. Any inkling of light on a much more practical way to do this would be incredibly appreciated, indeed!

Thanks a ton!

Uniprot blastx Taxonomy • 2.2k views
ADD COMMENT
1
Entering edit mode
7.4 years ago

using xml/xpath

$ curl -sL "http://www.uniprot.org/uniprot/001R_FRG3G.xml" | xmllint  --xpath "//*[name() ='organism']/*[name()='name' and @type='scientific']/text()" -

Frog virus 3 (isolate Goorha)
ADD COMMENT
0
Entering edit mode

Pierre: Thank you so much!!! May I ask - is there a way to give a file of these ACC_IDs? Because I have > 500,000 to look up and I cannot input each one manually.

ADD REPLY
1
Entering edit mode

there is a uniprot batch query: http://www.uniprot.org/help/uploadlists

ADD REPLY

Login before adding your answer.

Traffic: 1540 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6