Hi,
This question will seem INCREDIBLY similar to the question here, but it's not.
I ran diamond blastx on a large dataset with the output options -f 6 qseqid sseqid bitscore
This gives me a file that looks like
OHJ07_1_contig_1 ELU09376.1 66.6
OHJ07_1_contig_1 KFM76682.1 65.9
OHJ07_1_contig_1 JAT94707.1 63.2
OHJ07_1_contig_1 JAT94707.1 46.6
OHJ07_1_contig_1 XP_002400485.1 62.4
OHJ07_1_contig_1 XP_002400485.1 46.2
OHJ07_1_contig_1 XP_014787375.1 61.6
OHJ07_1_contig_1 XP_014787375.1 40.4
OHJ07_1_contig_1 KOF67573.1 61.6
And I'd like to get taxIds for each of these accession versions. The previous question, linked above, seems to only work on accession numbers, not accession version numbers.
I have also downloaded the NCBI prot.accession2taxid file, and tried to grep the accession version numbers back to this, but the output is in the order of the NCBI file, not mine. The NCBI file looks like
accession accession.version taxid gi
P29373 P29373.2 9606 132401
P22935 P22935.2 10090 132402
P18902 P18902.1 9913 132403
P02753 P02753.3 9606 62298174
P27485 P27485.2 9823 3041715
P06912 P06912.2 9986 1710096
My grep output (head) looks like this
P07201 P07201.2 6584 266918
P21329 P21329.1 7221 134082
P21328 P21328.1 7227 134083
P04571 P04571.1 126592 134316
P04572 P04572.1 6357 134318
P19217 P19217.1 9913 135052
P17248 P17248.3 9913 110283011
so obviously, I can't just paste the grep output with my diamond blast output.
Eventually I need a file that looks like my blast output, but with taxids in the place of accession version numbers, ie.
OHJ07_1_contig_1 283909 66.6
OHJ07_1_contig_1 407821 65.9
The way I see it, there are two approaches, somehow use the accession version numbers with the script provided here by Pierre and Steve, or I can try and sort my grep output from the NCBI file I downloaded so that I can paste the files and continue with the rest of the analysis.
I have tried the latter, to sort my grep file, but my script (below), is hideously slow and will take about 3 months to finish the whole job
while read line; do grep -m 1 $line prot.accession2taxid; done<test>taxIds
Any help would be appreciated!
EDIT: The "test" file I use in the above "solution", is just the column of accession versions pulled out of my blast output file
Accession versions should not matter correct? Version differences/changes will not change the taxid.
I'm not sure exactly how EUtils works, and the example uses accession numbers, not versions. Using my blast output and the bash script from the other question only gave me a blank output, so I thought the problem might be accession numbers?
I also just tried entering http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=$ELU09376 (and the same with .1 after the accession number which gave me a text file reading:
so I guess the problem is not my Accession Versions, but something else?