Question

How to convert XP_002531646.1 to sp|P05661|MYSA_DROME in Diamond blastX output

0

Entering edit mode

8.0 years ago

Farbod ★ 3.4k

Dear BioStars, Hi.

I have run blastx against NcBI nr using Diamond with tabular output and sensitive option

the result is as bellow:

TRINITY_DN212758_c0_g1_i1.......XP_002531646.1.....81.3 107 20 0 3 323 199 305 2.9e-41 176.4 TRINITY_DN212728_c0_g1_i1.....XP_014502021.1....89.2 37 4 0 3 113 403 439 8.6e-10 71.2 TRINITY_DN212793_c0_g1_i1.....XP_015200040.1.....91.8 61 5 0 665 483 238 298 9.7e-23 115.9

But I need something like this in the second row :

sp|P05661|MYSA_DROME

sp|Q7KRI2|LOLAL_DROME

sp|A1ZAU8|SSP4_DROME

.

Q: (1) Is there any converter tools for this task or if not, (2) which option I must add to my Diamond script ?

NOTE1: maybe it is the SwissProt IDs ?

NOTE2: my Diamond script:

diamond blastx -d nr -q Trinity_FM.fasta -o blastX-all-sesnitive.outfmt6 -f 6 -p 22 --evalue 0.000001 -k 1 --sensitive

NOTE3: these are the Diamond software tabular options

Value 6 may be followed by a space-separated list of these keywords:

qseqid means Query Seq - id
qlen means Query sequence length
sseqid means Subject Seq - id
sallseqid means All subject Seq - id(s), separated by a ';'
slen means Subject sequence length
qstart means Start of alignment in query
qend means End of alignment in query
sstart means Start of alignment in subject
send means End of alignment in subject
qseq means Aligned part of query sequence
sseq means Aligned part of subject sequence
evalue means Expect value
bitscore means Bit score
score means Raw score
length means Alignment length
pident means Percentage of identical matches
nident means Number of identical matches
mismatch means Number of mismatches
positive means Number of positive - scoring matches
gapopen means Number of gap openings
gaps means Total number of gaps
ppos means Percentage of positive - scoring matches
qframe means Query frame
stitle means Subject Title
salltitles means All Subject Title(s), separated by a '<>'
qcovhsp means Query Coverage Per HSP

~ Best

blast • 3.0k views

ADD COMMENT • link 8.0 years ago by Farbod ★ 3.4k

0

Entering edit mode

What database did you use for diamond? And what was the source of sequences used to populate it?

ADD REPLY • link 8.0 years ago by Tonor ▴ 480

0

Entering edit mode

Dear Tonor, Hi

I have used NCBI nr and the query is my transcriptome assembly fasta file.

I have seen a pie chart in this Nature paper (please have a look at Figure 2: Species percentages in BLASTX hits) and I have tried to create a similar chart for my data,

the authors mentioned that they have used NCBI database :"homology search between our contigs and the NCBI database".

But I guess they have used SwissProt !

What do you think ?

ADD REPLY • link 8.0 years ago by Farbod ★ 3.4k

0

Entering edit mode

Does XP_002531646.1directly relate to sp|P05661|MYSA_DROME? Or is it just an example of format

ADD REPLY • link 8.0 years ago by Tonor ▴ 480

0

Entering edit mode

Hi, Sorry Tonor,

I had to mention that it is just an example.

ADD REPLY • link 8.0 years ago by Farbod ★ 3.4k

0

Entering edit mode

What is the full line currently for:

TRINITY_DN212758_c0_g1_i1.......XP_002531646.1.....81.3 107 20 0 3 323 199 305 2.9e-41 176.4

What are the ...? replacing

ADD REPLY • link 8.0 years ago by Tonor ▴ 480

0

Entering edit mode

Nothing! it is just for this fact that in Biostars screen the more space do not necessarily mean more spacing,

so I have used "dot" instead of "space".

It that line, the only part that is important is the second column which I has been shown in bold.

ADD REPLY • link 8.0 years ago by Farbod ★ 3.4k

0

Entering edit mode

So at the moment you have:

Accession

But you want it:

sp|Accession|ProtName

ADD REPLY • link 8.0 years ago by Tonor ▴ 480

0

Entering edit mode

@Farbod wants to convert NCBI ID's to Swissprot ID.

@Farbod: You could use Uniprot ID converter or do the search again using uniprot database. If you are looking to do in place replacements then it may need a conversion of the ID's first and then replacement.

ADD REPLY • link 8.0 years ago by GenoMax 147k

0

Entering edit mode

It seems that there is no NCBI nr in "2-Select Options" . What must I select instead ?

ADD REPLY • link 8.0 years ago by Farbod ★ 3.4k

0

Entering edit mode

Use EMBL/GenBank/DDBJ You may also need to look in EMBL/GenBank/DDBJ CDS

ADD REPLY • link 8.0 years ago by GenoMax 147k

0

Entering edit mode

Hi,

I have select some of my NCBI nr IDs and have used EMBL/GenBank/DDBJ (and also CDS), but I have received "Sorry, no results were found" !

examples :

XP_015458527.1

XP_017541064.1

XP_012987512.1

XP_015682166.1

XP_017537366.1

XP_015457441.1

XP_015214365.1

ADD REPLY • link 8.0 years ago by Farbod ★ 3.4k

0

Entering edit mode

XP_* records are computational predictions/submissions and they are not part of UniProt. You may need to query UniParc database to get information on those. Remove the version numbers at the end of the record when you query.

ADD REPLY • link 8.0 years ago by GenoMax 147k

0

Entering edit mode

BTW: XP_002531646.1 seems to translate to B9T077 not P05661.

ADD REPLY • link 8.0 years ago by GenoMax 147k