Hello everyone.
I'm new on the forum and not a native english speaker, so I will try to be as clear as possible in my question. Forgive me for any mistakes.
Currently, I'm a CS student learning and working with Bioinformatics in a biology lab (roughly 3 months). I'm still learning to do preprocessing of sequence data and I need a tool to detect and remove contaminants. I have seen tools like Trimmomatic and Trim Galore! for filtering and trimming of primers and adapters, which are straighfoward to remove since they appear in the ends of the reads, but my advisor wants me to find an approach to clear "intra-read" contaminants (just to make sure, I don't mean low quality bases, but alien sequences that does't belong to the organism being sequenced), especially if there is such tool in the Galaxy Platform. I have found VecScreen, which hasn't served us well because the datasets are too big to be uploaded to a web based tool.
The closest solutions I found were DeconSeq (http://deconseq.sourceforge.net/) and some approaches using Biopython and/or BioPerl to do alignments with BLAST+ for detection of contaminants along with other tools to clear the dataset (like Prinseq). Until now, I haven´t seen a Galaxy server with DeconSeq (if there is one), which is why I´m trying to use the standalone version. If someone has ever used this tool, please tell me if it fullfills its purpose well. If someone knows another approach, I would be grateful to know.
I know my question may be very basic, but I decided to open a post because I have deadlines to deliver some results and I don´t want to waste time in something that won´t serve me.
Anyway, I´m very open to advises/suggestions from someone more experienced who knows how to deal with contaminants!
Many thanks.
BBMap's SendSketch is a fast way to screen raw reads for contaminants, compared to Blast. Usage:
Thank you very much! I didn't know BBMap, will take a look.
Could you refine your question, what do you mean by "contaminants"?
I mean a sequence of foreign origin that doesn't belong to the organism that was sequenced. More specifically, I'm seeking a solution to remove vector contaminants.
Thanks genomax and h.mon! I will take a look on BBTools and the other options mentioned by genomax. :)
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized.