Hello!
I've got a dozen different strains of bacteria for which we've sequenced the whole genomes of (we have paired end reads - forward and reverse - for each strain). I wish to find and locate a specific house keeping gene in each strain.
Could I convert the fastq files into fasta files, set up a blast database containing the fasta short read files and then blast the query gene sequence against those? Or would I need to assemble each genome first and then make a database out of the assemblies and then blast the query gene sequence against those?
Would appreciate your input, thanks :-)
Don't do any of that .. yet. Make a "genome" with the gene(s) (if known or choose examples from related strains) you need and then align with BBMap. Depending on how similar "different" strains in your pool are there is some risk that reads may multimap. It sounds like you are just looking to see if a specific gene is there so go ahead and use option
ambig=all
with BBMap to allow reads to multi-map at all possible locations.You could also try using BBSplit to bin the reads if you have the reference genomes for these strains.