Entering edit mode
6.1 years ago
BetterOrWorse
•
0
I need put name of genus and species of the best match in BLASTn with percentage of identity automatically, in headers of a multi-FASTA file. How can I do this? For example:
Before:
>xxxx|yyyy|zzzz
ATCG...
>xxxx|yxyx|zxzx
ATCG...
After:
>xxxx|yyyy|zzzz|*Genus_species*_99%
ATCG...
>xxxx|yxyx|zxzx|*Genus_species_2*_100%
ATCG...
Thanks!
Did you know that the singular from of the word species is species?
Ok. Thanks. I need put genus and species.
How does your blast output looks like?
'6 qseqid sseqid stitle pident length evalue sstart send qlen slen'
What have you tried? If you give real workable examples, there is likely someone here that will do this for you.
What is the blast command you used ?
Specie names is not that easy to catch with blast. Chose the informations you want in your blast output amongst this list. Like
qseqid
,pident
andscomnames
...The specie name you want could be under
sscinames
(Subject Scientific Name(s), separated by a ';'),scomnames
(Subject Common Name(s), separated by a ';') orsblastnames
(Subject Blast Name(s), separated by a ';')Then, keep the line of the best pident for each qseqid
You can now use a script language as Perl or Python (you can even do it in Unix if you want)
qseqid
as key andscomnames
+pident as valueThanks for answering! But I need help to write this script.
I could, but you have to help me, giving me the blast command line you used, and the attribute you want as species (
sscinames
,sscinames
,sblastnames
)If you don't know which attribute could be the best "species" for you, re-run your blast command adding
sscinames
,sscinames
andsblastnames
to your commandAnd copy the 10 first line of the blast output in your post
It all depends on your reference so if you can add that to your question it is easier to help. If you have taxonid's in your database it is "fairly easy" with python. You need to add
staxid
to the output and use the rankedlineage.dmp file. But like I said, we dont know your reference and where you want to get the species names from.could you post some specific example for input and expected output? Description is too generic.