How can I annotate multiple blast results?
1
0
Entering edit mode
4.0 years ago
Assa Yeroslaviz ★ 1.9k

Hi, I have a list of hits from a magicblast run against a fastq file I have. I was wondering if there is a way to annotate the hits I get. The run was done against nt DB.

The list of hits looks like that:

NB500982:283:HH3WJAFX2:1:21210:15709:8224   gi|1040217674   100 0   0   0   1...
NB500982:283:HH3WJAFX2:1:21210:20062:8227   gi|1040160167   100 0   0   0   1...
NB500982:283:HH3WJAFX2:1:21210:4790:8228    gi|1040197389   100 0   0   0   1...
NB500982:283:HH3WJAFX2:1:21210:12133:8228   gi|164790   98.6667 0   0   0   1...

The gene ID is in the second column.

Thanks

Assa

blast annotation • 1.1k views
ADD COMMENT
0
Entering edit mode

what exactly do you mean with 'annotate'? like get the description line of each hit ?

ADD REPLY
0
Entering edit mode

Yes, I would like to get the gene name and organism.

ADD REPLY
0
Entering edit mode

after looking into the 'manual' of magicblast I don't think you can get that directly from the magicblast output.

You can however run all the hit IDs through entrez or such and retrieve the description for them.

ADD REPLY
0
Entering edit mode

I know I can't get it via magicblast. My question is how I can get it otherwise.

ADD REPLY
2
Entering edit mode
4.0 years ago
GenoMax 147k

Using EntrezDirect:

$ more gi
1040217674
1040160167
1040197389

$ for i in `cat gi`; do printf ${i}"\t"; esearch -db nuccore -query ${i} | elink -target gene | esummary | xtract -pattern DocumentSummary -element Name,Description,ScientificName; done
1040217674  PROKR2  prokineticin receptor 2 Oryctolagus cuniculus
1040160167  CCDC180 coiled-coil domain containing 180   Oryctolagus cuniculus
1040197389  SORCS1  sortilin related VPS10 domain containing receptor 1 Oryctolagus cuniculus
ADD COMMENT
0
Entering edit mode

thanks, I was just now looking into efetch and esearch and tried to concatenate a pipeline for that. but you beat me to it. Thnaks again

ADD REPLY
0
Entering edit mode

Hi again, I'm not sure what elink should do, but with it in the command I don't get any results.

$ esearch -db nuccore -query 34809228 | elink -target gene | esummary | xtract -pattern DocumentSummary -element  Id,Caption,Organism

Here I get no results, but with this

$ esearch -db nuccore -query 34809228 | esummary | xtract -pattern DocumentSummary -element Id,Caption,Organism
34809228" term="34809228    AY386695    rabbit
ADD REPLY
1
Entering edit mode

You should have stayed away from GI numbers (if you had that option when you did your magicblast). gi numbers are deprecated for end-user use and this may be one of those examples where things don't work. We can get a result searching with that on NCBI site but the command line query does indeed not seem to be working reliably.

ADD REPLY
0
Entering edit mode

Yes, this is true, but we have to work with what we got. And I got stuck with the gi numbers :-( So I need to make the best out of it. I don't think magicblast can output something else. But thanks for the help.

ADD REPLY

Login before adding your answer.

Traffic: 2406 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6