Blastn: Blast short reads against genomes and retain only good hits
0
0
Entering edit mode
7.3 years ago
Tamandua • 0

Hi. I have a shotgun sequencing dataset and a few reference genomes. I would like to blast the reads against the genomes and retain only long and good hits (>100 bp, >e-40) in the output file. Is there an easy way of doing this? I have so far tried the following command, but that does not include the >100bp limit and the output does not seem to be what I need.

blastn -db database.fasta -query input.fasta -evalue 1e-60 -outfmt '6 sseqid sseq'     | awk 'BEGIN{FS="\t"; OFS="\n"}{gsub(/-/, "", $2); print ">"$1,$2}'     > output.fasta
Blast next-gen sequencing genome • 2.6k views
ADD COMMENT
2
Entering edit mode

you can add alignment length to output and filter according to it (using awk).

ADD REPLY
0
Entering edit mode

How many genomes, and what is the size of the genomes? It could be easier to map the reads with e.g., BWA, Bowtie or BBMap. If the number of genomes is small, you could also use BBSplit, setting the ambiguous reads splitting among references genomes as desired - probably you want ambiguous=all.

ADD REPLY

Login before adding your answer.

Traffic: 2107 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6