Entering edit mode
4.2 years ago
eennadi
▴
40
I download Swiss-prot protein in fasta format. The gene name is in this form
sp|Q5ANA3|CDR1_CANAL Pleiotropic ABC efflux transporter of multiple drugs CDR1 OS=Candida albicans (strain SC5314 / ATCC MYA-2876) OX=237561 GN=CDR1 PE=1 SV=2
I want the "Pleiotropic ABC efflux transporter of multiple drugs CDR1 OS" to appear in the blastp output.
However when I ran
blastp -db ~/uniprot.fasta -query ~/CBS11016.genome.all.maker.proteins.fasta -out out.blastp -evalue .000001 -outfmt 6 -num_alignments 1 -seg yes -soft_masking true -show_gis -lcase_masking -max_hsps 1
The output is as seen below:
GEVV02003421.1 sp|Q59L89|SDHF3_CANAL 98.361 122 2 0 1 122 1 122 1.71e-85 242GEVV02003995.1 sp|Q59S27|RT106_CANAL 99.024 410 3 1 1 410 1 409 0.0 831
GEVV02006224.1 sp|Q59RL7|CPH2_CANAL 99.179 853 7 0 1 853 1 853 0.0 1729GEVV02000384.1 sp|Q6FNV5|HRD3_CANGA 29.960 247 131 6 63 271 97 339 1.47e-20 89.4
How can I make the output to show the full name?
It must exist a way to do this in the blast parameters, but i find it more easy to just
translate
the spaces to underlines (or other removable character like %):-outfmt '6 stitle <other column identifiers>'
would get you tabular output withstitle
being the column for the "full" name of the target.Hi, I had the same problem. And I plan to download a EXCEL format of the "header" in the SwissProt database and convert it into CSV.
grep the "SwissProt id found in blastp result" in the csv header file and get another result_header_file.
sort two files by the SwissProt id.
Then paste two files which are both delimited by comma.