Hello fellow researchers!
I got BLAST results that show that only small part of each sequence in an assembly match the BLAST database. The greatest number of such matches is for bacteria. It's expected to have contamination of bacteria, so the question is why, for instance, only half of a sequence in the assembly matches bacteria according to the BLAST results, and not the entire sequence or 80% or 90% of it?
Thanks!
Thanks. I use the database for all the species, not only bacteria. I'd love to hear your opinion: Considering your answer, if I'll remove the sequences that align with contamination such as bacteria, I'll get sequences that might be from the species I'm interested in. Those remaining sequences can be used for farther analysis such as alignment with the NCBI reference genome of the species of interest.
Does that sound reasonable?