Blast Output with only one line
1
0
Entering edit mode
3.6 years ago
pablo ▴ 310

Hello.

I have a reference genome (>3000 sequences/contigs) and a de novo assembly (>600 contigs) obtained with PacBio reads.

I would like to compare these two genomes (~2.7Gb for both) in order to know the concordance between the contigs from both oh them.

For that, I'd like to use blast. I splitted the reference genome in separate files for each contig (so, I got >3000 contigs , ranging from 170Mb to few Kb). Then, I blast each of these files against my de novo assembly (with the variable $SLURM_ARRAY_TASK_ID in order to run the blasts in parallel).

I'd like to get the best hit for each blast, so I use :

blastn -query my_sequences.fasta -db denovo_assembly.fa -out mu_outputs.txt -max_target_seqs 1 -outfmt "6 qseqid qlen sseqid qstart qend evalue score length" -num_threads 32 

The output I get is like that (I show for one blast) :

NC_001913.1     17245   ptg000952l      1       17223   0.0     17081   17265
NC_001913.1     17245   ptg000952l      1       10420   0.0     10397   10428
NC_001913.1     17245   ptg000952l      10383   15925   0.0     5516    5547
NC_001913.1     17245   ptg000952l      10379   12626   0.0     2227    2251
NC_001913.1     17245   ptg000952l      16224   17223   0.0     909     1023
NC_001913.1     17245   ptg000952l      16610   17245   0.0     630     636
NC_001913.1     17245   ptg000952l      16610   17245   0.0     630     636
NC_001913.1     17245   ptg000952l      16610   17070   0.0     458     461
NC_001913.1     17245   ptg000952l      16610   17070   0.0     458     461
NC_001913.1     17245   ptg000952l      16763   17223   0.0     455     461
NC_001913.1     17245   ptg000952l      16763   17223   0.0     455     461
NC_001913.1     17245   ptg000952l      15951   16449   0.0     372     505
NC_001913.1     17245   ptg000952l      16610   16917   1.33e-158       308     308
NC_001913.1     17245   ptg000952l      16610   16917   1.33e-158       308     308
NC_001913.1     17245   ptg000952l      16916   17223   6.19e-157       305     308

Is it possible to get only one line of this output? Because what I want to know is if one contig from the reference matches with one contig from my de novo assembly. So, I don't mind to get all of these output lines, only one would be sufficient.

Best.

blast • 1.2k views
ADD COMMENT
0
Entering edit mode

Can you pipe the output to head?

ADD REPLY
0
Entering edit mode

I suppose the OP wants one subject hit per query sequence not just the first line of the output?

ADD REPLY
0
Entering edit mode

Thanks for the reply. I'm just looking for the first line of the output . So the -subject_besthit option is ok in my case.

ADD REPLY
1
Entering edit mode
3.6 years ago

I think you should add to blastn the option -subject_besthit

ADD COMMENT

Login before adding your answer.

Traffic: 2349 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6