Entering edit mode
7.3 years ago
Tamandua
•
0
Hi. I have a shotgun sequencing dataset and a few reference genomes. I would like to blast the reads against the genomes and retain only long and good hits (>100 bp, >e-40) in the output file. Is there an easy way of doing this? I have so far tried the following command, but that does not include the >100bp limit and the output does not seem to be what I need.
blastn -db database.fasta -query input.fasta -evalue 1e-60 -outfmt '6 sseqid sseq' | awk 'BEGIN{FS="\t"; OFS="\n"}{gsub(/-/, "", $2); print ">"$1,$2}' > output.fasta
you can add alignment length to output and filter according to it (using awk).
How many genomes, and what is the size of the genomes? It could be easier to map the reads with e.g., BWA, Bowtie or BBMap. If the number of genomes is small, you could also use BBSplit, setting the ambiguous reads splitting among references genomes as desired - probably you want ambiguous=all.