after blast with uniprot.fasta file, how could I get the output file which included all blasted protein's all sequence header
1
1
Entering edit mode
9.0 years ago
Kurban ▴ 230

Hey guys,

I have downloaded uniprot.fasta, now I want to blast the protein sequences with my transcripts.

uniprot.fasta file format:

kurban@kurban-X550VC:~/Desktop/Uniprot$ more uniprot_sprot.fasta
>sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) GN=FV3-001R PE=4 SV=1
MAFSAEDVLKEYDRRRRMEALLLSLYYPNDRKLLDYKEWSPPRVQVECPKAPVEWNNPPS
EKGLIVGHFSGIKYKGEKAQASEVDVNKMCCWVSKFKDAMRRYQGIQTCKIPGKVLSDLD
AKIKAYNLTVEGVEGFVRYSRVTKQHVAAFLKELRHSKQYENVNLIHYILTDKRVDIQHL
EKDLVKDFKALVESAHRMRQGHMINVKYILYQLLKKHGHGPDGPDILTVKTGSKGVLYDD
SFRKIYTDLGWKFTPL

My query fasta file format:

kurban@kurban-X550VC:~/Desktop/Uniprot$ more truncated_cd-hit-est-Trinity_CD_and_CK.fasta
>TR1|c0_g1_i1
TAAGAGGTAAGAAAGCTAGAAAAGAGGAAATATTTTTAATAAAAATAATAAAACTTAATA
ATATAATAATAAGTATCTTTTTATAATATTATAATAAATAAAATAAGGTAGAAATTATAT
AAATTTATAAGAAAGTAATATTCTTATAATAAGAATTAACTTTTATTAATATTAAACTAG
CTAAAGTAAAAATATAAATTTAAAAAAAAGATAATAATAATAAAGATTTTAAAAAATA

and I have done blast:

blastx -db uniprot_sprot.fasta -query truncated_cd-hit-est-Trinity_CD_and_CK.fasta -out uniprot_sprot_truncated_cd-hit-est-Trinity_CD_and_CK_blastx_tabular -evalue 1e-5 -num_threads 3 -num_alignments 1 -outfmt 6

The output file form I got:

kurban@kurban-X550VC:~/Desktop/Uniprot$ more uniprot_sprot_truncated_cd-hit-est-Trinity_CD_and_CK_blastx_tabular
TR4|c0_g1_i1    sp|Q9WVJ0|KCNH3_MOUSE    76.54    81    19    0    243    1    2    82    8e-40     144
TR21|c0_g1_i1    sp|Q99315|YG31B_YEAST    34.09    88    58    0    1    264    708    795    2e-06    49.3
TR22|c0_g1_i1    sp|Q06559|RS3_DROME    62.67    75    28    0    2    226    146    220    3e-28     107
TR51|c0_g1_i1    sp|Q9M4T8|PSA5_SOYBN    50.00    78    38    1    239    6    40    116    1e-21    89.4
TR52|c0_g1_i1    sp|Q9UBS5|GABR1_HUMAN    50.00    102    36    4    3    299    377    466    8e-24    99.8
TR70|c0_g1_i1    sp|Q9H5L6|THAP9_HUMAN    31.36    169    108    5    499    2    322    485    5e-17    82.8
TR72|c0_g1_i1    sp|Q13200|PSMD2_HUMAN    51.95    77    37    0    1    231    666    742    5e-20    88.2
TR81|c0_g1_i1    sp|Q12296|MAM3_YEAST    32.00    125    82    2    3    374    204    326    3e-14    73.9
TR82|c0_g1_i1    sp|Q6BSS8|APTH1_DEBHA    50.68    73    34    2    20    235    161    232    4e-16    73.9
TR84|c0_g1_i1    sp|P20825|POL2_DROME    54.17    72    33    0    6    221    300    371    4e-20    88.2
TR97|c0_g1_i1    sp|Q921I9|EXOS4_MOUSE    36.67    90    55    2    280    14    101    189    4e-10    58.2

There is no protein information included in second column in the output file. If I could get the blasted sequences all header info. or protein information included in the second column would be awesome. The blast output file form I want to get might be look like this:

TR4|c0_g1_i1    sp|Q9WVJ0|KCNH3_MOUSE    Uncharacterized protein 009R 76.54    81    19    0    243    1    2    82    8e-40     144
TR21|c0_g1_i1    sp|Q99315|YG31B_YEAST    Uncharacterized protein 042L 34.09    88    58    0    1    264    708    795    2e-06    49.3

or something looks like that.

Could you give me some suggestions? How could I do that?

blast • 2.5k views
ADD COMMENT
0
Entering edit mode
9.0 years ago
dschika ▴ 320

Have you had a look at the outfmt options? Check the formatting options with:

blastx -help
blastx ... -outfmt "6 qseqid sseqid sgi ..."
ADD COMMENT
0
Entering edit mode

I'm not sure sure makeblastedb can parse the info correctly from uniprot.fasta. One option would be to create a map file with two columns, "uniprot ID" (e.g. sp|Q9WVJ0|KCNH3_MOUSE) in first column and the other info OP wants in second column. Then OP could use join to join the blast output file based on column 2 and map file based on column 1 and output in his desired format.

ADD REPLY

Login before adding your answer.

Traffic: 1639 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6