Dear BioStars, Hi.
I have run blastx against NcBI nr using Diamond with tabular output and sensitive option
the result is as bellow:
TRINITY_DN212758_c0_g1_i1.......XP_002531646.1.....81.3 107 20 0 3 323 199 305 2.9e-41 176.4 TRINITY_DN212728_c0_g1_i1.....XP_014502021.1....89.2 37 4 0 3 113 403 439 8.6e-10 71.2 TRINITY_DN212793_c0_g1_i1.....XP_015200040.1.....91.8 61 5 0 665 483 238 298 9.7e-23 115.9
But I need something like this in the second row :
sp|P05661|MYSA_DROME
sp|Q7KRI2|LOLAL_DROME
sp|A1ZAU8|SSP4_DROME
.
Q: (1) Is there any converter tools for this task or if not, (2) which option I must add to my Diamond script ?
NOTE1: maybe it is the SwissProt IDs ?
NOTE2: my Diamond script:
diamond blastx -d nr -q Trinity_FM.fasta -o blastX-all-sesnitive.outfmt6 -f 6 -p 22 --evalue 0.000001 -k 1 --sensitive
NOTE3: these are the Diamond software tabular options
Value 6 may be followed by a space-separated list of these keywords:
qseqid means Query Seq - id
qlen means Query sequence length
sseqid means Subject Seq - id
sallseqid means All subject Seq - id(s), separated by a ';'
slen means Subject sequence length
qstart means Start of alignment in query
qend means End of alignment in query
sstart means Start of alignment in subject
send means End of alignment in subject
qseq means Aligned part of query sequence
sseq means Aligned part of subject sequence
evalue means Expect value
bitscore means Bit score
score means Raw score
length means Alignment length
pident means Percentage of identical matches
nident means Number of identical matches
mismatch means Number of mismatches
positive means Number of positive - scoring matches
gapopen means Number of gap openings
gaps means Total number of gaps
ppos means Percentage of positive - scoring matches
qframe means Query frame
stitle means Subject Title
salltitles means All Subject Title(s), separated by a '<>'
qcovhsp means Query Coverage Per HSP
~ Best
What database did you use for diamond? And what was the source of sequences used to populate it?
Dear Tonor, Hi
I have used NCBI nr and the query is my transcriptome assembly fasta file.
I have seen a pie chart in this Nature paper (please have a look at Figure 2: Species percentages in BLASTX hits) and I have tried to create a similar chart for my data,
the authors mentioned that they have used NCBI database :"homology search between our contigs and the NCBI database".
But I guess they have used SwissProt !
What do you think ?
Does XP_002531646.1directly relate to sp|P05661|MYSA_DROME? Or is it just an example of format
Hi, Sorry Tonor,
I had to mention that it is just an example.
What is the full line currently for:
TRINITY_DN212758_c0_g1_i1.......XP_002531646.1.....81.3 107 20 0 3 323 199 305 2.9e-41 176.4
What are the ...? replacing
Nothing! it is just for this fact that in Biostars screen the more space do not necessarily mean more spacing,
so I have used "dot" instead of "space".
It that line, the only part that is important is the second column which I has been shown in bold.
So at the moment you have:
But you want it:
@Farbod wants to convert NCBI ID's to Swissprot ID.
@Farbod: You could use Uniprot ID converter or do the search again using uniprot database. If you are looking to do in place replacements then it may need a conversion of the ID's first and then replacement.
It seems that there is no NCBI nr in "2-Select Options" . What must I select instead ?
Use
EMBL/GenBank/DDBJ
You may also need to look inEMBL/GenBank/DDBJ CDS
Hi,
I have select some of my NCBI nr IDs and have used EMBL/GenBank/DDBJ (and also CDS), but I have received "Sorry, no results were found" !
examples :
XP_015458527.1
XP_017541064.1
XP_012987512.1
XP_015682166.1
XP_017537366.1
XP_015457441.1
XP_015214365.1
XP_* records are computational predictions/submissions and they are not part of UniProt. You may need to query UniParc database to get information on those. Remove the version numbers at the end of the record when you query.
BTW: XP_002531646.1 seems to translate to B9T077 not P05661.