Hello.
I have a reference genome (>3000 sequences/contigs) and a de novo assembly (>600 contigs) obtained with PacBio reads.
I would like to compare these two genomes (~2.7Gb for both) in order to know the concordance between the contigs from both oh them.
For that, I'd like to use blast. I splitted the reference genome in separate files for each contig (so, I got >3000 contigs , ranging from 170Mb to few Kb). Then, I blast each of these files against my de novo assembly (with the variable $SLURM_ARRAY_TASK_ID in order to run the blasts in parallel).
I'd like to get the best hit for each blast, so I use :
blastn -query my_sequences.fasta -db denovo_assembly.fa -out mu_outputs.txt -max_target_seqs 1 -outfmt "6 qseqid qlen sseqid qstart qend evalue score length" -num_threads 32
The output I get is like that (I show for one blast) :
NC_001913.1 17245 ptg000952l 1 17223 0.0 17081 17265
NC_001913.1 17245 ptg000952l 1 10420 0.0 10397 10428
NC_001913.1 17245 ptg000952l 10383 15925 0.0 5516 5547
NC_001913.1 17245 ptg000952l 10379 12626 0.0 2227 2251
NC_001913.1 17245 ptg000952l 16224 17223 0.0 909 1023
NC_001913.1 17245 ptg000952l 16610 17245 0.0 630 636
NC_001913.1 17245 ptg000952l 16610 17245 0.0 630 636
NC_001913.1 17245 ptg000952l 16610 17070 0.0 458 461
NC_001913.1 17245 ptg000952l 16610 17070 0.0 458 461
NC_001913.1 17245 ptg000952l 16763 17223 0.0 455 461
NC_001913.1 17245 ptg000952l 16763 17223 0.0 455 461
NC_001913.1 17245 ptg000952l 15951 16449 0.0 372 505
NC_001913.1 17245 ptg000952l 16610 16917 1.33e-158 308 308
NC_001913.1 17245 ptg000952l 16610 16917 1.33e-158 308 308
NC_001913.1 17245 ptg000952l 16916 17223 6.19e-157 305 308
Is it possible to get only one line of this output? Because what I want to know is if one contig from the reference matches with one contig from my de novo assembly. So, I don't mind to get all of these output lines, only one would be sufficient.
Best.
Can you pipe the output to
head
?I suppose the OP wants one subject hit per query sequence not just the first line of the output?
Thanks for the reply. I'm just looking for the first line of the output . So the
-subject_besthit
option is ok in my case.