Entering edit mode
8.7 years ago
spirowol
•
0
Hello there I am trying to BLAST and sort thousands of contigs generated from my assemblies. The problem is that my target contigs belong to a bacteria and DNA I used for sequecing wasn't pure; instead I have a mixture of contigs from at least two different species and I'd like to separate them by species when are identified in BLAST. Do anybody did this before? Thanks
Yes, I filtered already my reads to discard those from unwanted organisms but since the large amount of DNA belongs to a large eukaryotic organism (non sequenced yet) I still see host DNA and other bacterial contaminants (which reads I also filtered before). Output taxonomic returns the fasta sequences or just the BLAST ID results?
Using tabular format (6) or a few others,you can set blast to output the taxonomic IDs along with the standard fields (query id, subject id, etc). See the BLAST documentation for more detail.
If you want the full subject sequences, it would be fairly trivial to extract them from the database searched using blastdbcmd and a list of sequence IDs from your results.
I used -outfmt 6and I can have a list of my contigs that actually BLAST with the desired bacteria with all the IDs. But I want to recover my blasted contigs (query) not the subject sequences. The objective is to create two datasets of contigs one with the contaminant sequences and the other only made of contigs that belong to the target bacteria. The contigs belonging to the target bacteria will be used later for scaffolding and genome finishing
Taxids are not output by default, you'll need to add them to the output. Run blast, then split the BLAST results by taxid, those matching contaminating species and those not matching contaminants. After that use the query ID in those files to filter your contigs accordingly.
Another option, again assuming your blast database stores the IDs would be to use blastdbcmd to extract taxids for your hits, then map taxids against your contigs via this file and filter accordingly. This would avoid having to run BLAST over again if you've already run it and didn't collect taxids in your results.