Hi everyone,
I am running blastn on the commandline, trying different options for formatting. This command:
blastn -query blastme.fasta -out remote.blastn -db nr -evalue 1e-30 -outfmt 18 -max_target_seqs=1 &
Gives output that looks like this:
<body style="font-size:80%;"> Accession Description Score E-value Rhodoferax saidenbachensis [b-proteobacteria] CP019239 Rhodoferax saidenbachensis strain DSM 22694, complete genome 176 2e-40 Tax BLAST report Query= SRR8559322.121301.1 121301 length=221 Length=221 Organism Report Accession Description Score E-value Janthinobacterium sp. 1_2014MBL_MicDiv [b-proteobacteria] CP011319 Janthinobacterium sp. 1_2014MBL_MicDiv, complete genome 200 2e-47 Tax BLAST report Query= SRR8559322.122717.1 122717 length=178 Length=178 Organism Report Accession... Description... Score... E-value... Tax BLAST report Query= SRR8559322.126209.1 126209 length=1952 Length=1952 Organism Report Accession Description Score E-value Massilia sp. NR 4-1 [b-proteobacteria] CP012201 Massilia sp. NR 4-1, complete genome 1857 0.0 Tax BLAST report Query= SRR8559022.132866.1 132866 length=94 Length=94 Organism Report Accession... Description... Score... E-value... Tax BLAST report </small>
I'd like to get output that looks like this:
SRR8559322.119579.1 [b-proteobacteria] SRR8559322.121301.1 [b-proteobacteria] SRR8559322.122717.1 SRR8559322.126209.1 [b-proteobacteria] SRR8559022.132866.1
Note that the second column should be blank where there were no hits found.
There doesn't seem to be a Blast option for anything similar to this. Can anyone suggest a grep/sed type command that I could use on the results to put them into tabular form like this?
Thanks for any advice.
In the past, I have specified the XML output (-outfmt 5) and converted the results using this python script. This allows you to get a good amount of information per hit.