How to make blastp show complete gene name?
2
0
Entering edit mode
4.2 years ago
eennadi ▴ 40

I download Swiss-prot protein in fasta format. The gene name is in this form

sp|Q5ANA3|CDR1_CANAL Pleiotropic ABC efflux transporter of multiple drugs CDR1 OS=Candida albicans (strain SC5314 / ATCC MYA-2876) OX=237561 GN=CDR1 PE=1 SV=2

I want the "Pleiotropic ABC efflux transporter of multiple drugs CDR1 OS" to appear in the blastp output.

However when I ran

 blastp -db ~/uniprot.fasta -query ~/CBS11016.genome.all.maker.proteins.fasta -out out.blastp  -evalue  .000001 -outfmt 6 -num_alignments 1 -seg yes -soft_masking true  -show_gis  -lcase_masking -max_hsps 1

The output is as seen below:

GEVV02003421.1  sp|Q59L89|SDHF3_CANAL   98.361  122     2       0       1       122     1       122     1.71e-85        242GEVV02003995.1  sp|Q59S27|RT106_CANAL   99.024  410     3       1       1       410     1       409     0.0     831
GEVV02006224.1  sp|Q59RL7|CPH2_CANAL    99.179  853     7       0       1       853     1       853     0.0     1729GEVV02000384.1  sp|Q6FNV5|HRD3_CANGA    29.960  247     131     6       63      271     97      339     1.47e-20        89.4

How can I make the output to show the full name?

gene • 2.1k views
ADD COMMENT
0
Entering edit mode

It must exist a way to do this in the blast parameters, but i find it more easy to just translate the spaces to underlines (or other removable character like %):

cat original_headers.fasta | tr " " "_" > converted_headers.fasta
ADD REPLY
0
Entering edit mode

-outfmt '6 stitle <other column identifiers>' would get you tabular output with stitle being the column for the "full" name of the target.

ADD REPLY
0
Entering edit mode

Hi, I had the same problem. And I plan to download a EXCEL format of the "header" in the SwissProt database and convert it into CSV.
grep the "SwissProt id found in blastp result" in the csv header file and get another result_header_file.
sort two files by the SwissProt id.
Then paste two files which are both delimited by comma.

ADD REPLY
2
Entering edit mode
4.2 years ago
JC 13k

You can add additional columns in your Blast output:

 Options 6, 7 and 10 can be additionally configured to produce a custom format specified by space delimited format specifiers, or by a token specified by the delim keyword. E.g.: "17 delim=@ qacc sacc score".
   The delim keyword must appear after the numeric output format specification.
   The supported format specifiers are:
            qseqid means Query Seq-id
               qgi means Query GI
              qacc means Query accesion
           qaccver means Query accesion.version
              qlen means Query sequence length
            sseqid means Subject Seq-id
         sallseqid means All subject Seq-id(s), separated by a ';'
               sgi means Subject GI
            sallgi means All subject GIs
              sacc means Subject accession
           saccver means Subject accession.version
           sallacc means All subject accessions
              slen means Subject sequence length
            qstart means Start of alignment in query
              qend means End of alignment in query
            sstart means Start of alignment in subject
              send means End of alignment in subject
              qseq means Aligned part of query sequence
              sseq means Aligned part of subject sequence
            evalue means Expect value
          bitscore means Bit score
             score means Raw score
            length means Alignment length
            pident means Percentage of identical matches
            nident means Number of identical matches
          mismatch means Number of mismatches
          positive means Number of positive-scoring matches
           gapopen means Number of gap openings
              gaps means Total number of gaps
              ppos means Percentage of positive-scoring matches
            frames means Query and subject frames separated by a '/'
            qframe means Query frame
            sframe means Subject frame
              btop means Blast traceback operations (BTOP)
            staxid means Subject Taxonomy ID
          ssciname means Subject Scientific Name
          scomname means Subject Common Name
        sblastname means Subject Blast Name
         sskingdom means Subject Super Kingdom
           staxids means unique Subject Taxonomy ID(s), separated by a ';'
                         (in numerical order)
         sscinames means unique Subject Scientific Name(s), separated by a ';'
         scomnames means unique Subject Common Name(s), separated by a ';'
        sblastnames means unique Subject Blast Name(s), separated by a ';'
                         (in alphabetical order)
        sskingdoms means unique Subject Super Kingdom(s), separated by a ';'
                         (in alphabetical order)
            stitle means Subject Title
        salltitles means All Subject Title(s), separated by a '<>'
           sstrand means Subject Strand
             qcovs means Query Coverage Per Subject
           qcovhsp means Query Coverage Per HSP
            qcovus means Query Coverage Per Unique Subject (blastn only)
   When not provided, the default value is:
   'qaccver saccver pident length mismatch gapopen qstart qend sstart send evalue bitscore', which is equivalent to the keyword 'std'
ADD COMMENT
0
Entering edit mode
4.1 years ago
eennadi ▴ 40

I added -show_gis to the blast commandline and it worked

ADD COMMENT

Login before adding your answer.

Traffic: 1683 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6