Hello. I did the assembly of transcripts of a bird, using the pre-processing tools AlienTrimmer (remove adapters, trimming and filtering), fqCleaner (remove vectors contaminates), and for assembly was used Trinity. To check if I still had any adapters or contaminant vectors, I ran blastN from my assembly against a multi-base base of UniVec contaminants and Vectors (NCBI) with minimum e-value cutoff 1e-5. I generated a short list of HIT identifying possible contaminants, ranging from 60% to 100% identity, on average e-value 1e-10. Should I remove from my assembly only the contigs that showed 100% identity with contaminant vectors? Is there a minimum percentage of identity to be removed in this case?
Does it mean
fqCleaner
did not do its job properly or it just missed some things? Is the vector hit you are seeing to the expected vector? Just because you are seeing a "hit" (is it 100% across the entire vector) to a vector may not mean it is a genuine contaminant, especially if it is not expected to be there in the first place.