Question

after blast with uniprot.fasta file, how could I get the output file which included all blasted protein's all sequence header

1

Entering edit mode

9.7 years ago

Kurban ▴ 230

Hey guys,

I have downloaded uniprot.fasta, now I want to blast the protein sequences with my transcripts.

uniprot.fasta file format:

kurban@kurban-X550VC:~/Desktop/Uniprot$ more uniprot_sprot.fasta
>sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) GN=FV3-001R PE=4 SV=1
MAFSAEDVLKEYDRRRRMEALLLSLYYPNDRKLLDYKEWSPPRVQVECPKAPVEWNNPPS
EKGLIVGHFSGIKYKGEKAQASEVDVNKMCCWVSKFKDAMRRYQGIQTCKIPGKVLSDLD
AKIKAYNLTVEGVEGFVRYSRVTKQHVAAFLKELRHSKQYENVNLIHYILTDKRVDIQHL
EKDLVKDFKALVESAHRMRQGHMINVKYILYQLLKKHGHGPDGPDILTVKTGSKGVLYDD
SFRKIYTDLGWKFTPL

My query fasta file format:

kurban@kurban-X550VC:~/Desktop/Uniprot$ more truncated_cd-hit-est-Trinity_CD_and_CK.fasta
>TR1|c0_g1_i1
TAAGAGGTAAGAAAGCTAGAAAAGAGGAAATATTTTTAATAAAAATAATAAAACTTAATA
ATATAATAATAAGTATCTTTTTATAATATTATAATAAATAAAATAAGGTAGAAATTATAT
AAATTTATAAGAAAGTAATATTCTTATAATAAGAATTAACTTTTATTAATATTAAACTAG
CTAAAGTAAAAATATAAATTTAAAAAAAAGATAATAATAATAAAGATTTTAAAAAATA

and I have done blast:

blastx -db uniprot_sprot.fasta -query truncated_cd-hit-est-Trinity_CD_and_CK.fasta -out uniprot_sprot_truncated_cd-hit-est-Trinity_CD_and_CK_blastx_tabular -evalue 1e-5 -num_threads 3 -num_alignments 1 -outfmt 6

The output file form I got:

kurban@kurban-X550VC:~/Desktop/Uniprot$ more uniprot_sprot_truncated_cd-hit-est-Trinity_CD_and_CK_blastx_tabular
TR4|c0_g1_i1    sp|Q9WVJ0|KCNH3_MOUSE    76.54    81    19    0    243    1    2    82    8e-40     144
TR21|c0_g1_i1    sp|Q99315|YG31B_YEAST    34.09    88    58    0    1    264    708    795    2e-06    49.3
TR22|c0_g1_i1    sp|Q06559|RS3_DROME    62.67    75    28    0    2    226    146    220    3e-28     107
TR51|c0_g1_i1    sp|Q9M4T8|PSA5_SOYBN    50.00    78    38    1    239    6    40    116    1e-21    89.4
TR52|c0_g1_i1    sp|Q9UBS5|GABR1_HUMAN    50.00    102    36    4    3    299    377    466    8e-24    99.8
TR70|c0_g1_i1    sp|Q9H5L6|THAP9_HUMAN    31.36    169    108    5    499    2    322    485    5e-17    82.8
TR72|c0_g1_i1    sp|Q13200|PSMD2_HUMAN    51.95    77    37    0    1    231    666    742    5e-20    88.2
TR81|c0_g1_i1    sp|Q12296|MAM3_YEAST    32.00    125    82    2    3    374    204    326    3e-14    73.9
TR82|c0_g1_i1    sp|Q6BSS8|APTH1_DEBHA    50.68    73    34    2    20    235    161    232    4e-16    73.9
TR84|c0_g1_i1    sp|P20825|POL2_DROME    54.17    72    33    0    6    221    300    371    4e-20    88.2
TR97|c0_g1_i1    sp|Q921I9|EXOS4_MOUSE    36.67    90    55    2    280    14    101    189    4e-10    58.2

There is no protein information included in second column in the output file. If I could get the blasted sequences all header info. or protein information included in the second column would be awesome. The blast output file form I want to get might be look like this:

TR4|c0_g1_i1    sp|Q9WVJ0|KCNH3_MOUSE    Uncharacterized protein 009R 76.54    81    19    0    243    1    2    82    8e-40     144
TR21|c0_g1_i1    sp|Q99315|YG31B_YEAST    Uncharacterized protein 042L 34.09    88    58    0    1    264    708    795    2e-06    49.3

or something looks like that.

Could you give me some suggestions? How could I do that?

blast • 2.7k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 9.7 years ago by Kurban ▴ 230

Ram · Answer 1 · 2015-12-02

0

Entering edit mode

9.7 years ago

dschika ▴ 320

Have you had a look at the outfmt options? Check the formatting options with:

blastx -help
blastx ... -outfmt "6 qseqid sseqid sgi ..."

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 9.7 years ago by dschika ▴ 320

0

Entering edit mode

I'm not sure sure makeblastedb can parse the info correctly from uniprot.fasta. One option would be to create a map file with two columns, "uniprot ID" (e.g. sp|Q9WVJ0|KCNH3_MOUSE) in first column and the other info OP wants in second column. Then OP could use join to join the blast output file based on column 2 and map file based on column 1 and output in his desired format.

ADD REPLY • link updated 5.7 years ago by Ram 45k • written 9.7 years ago by 5heikki 11k