Hello everyone, I am a rookie just start learning bioinformatics, and I have a question.
Now I have a fasta file that reads were assembled into contigs by CLC workbench and the same specie reference genome files. I want to know how could I filtrate the fasta file to get a decontaminated clean genome.
THANK YOU FOR ANSWERING!!!
If you had suspected that there is contamination in your sequence data you should have tried to remove those sequences before assembling the data.
Do you have evidence that there is contamination in your assembly (you even used a reference genome of the same organism)?
Thank you for answering!
As I rechecked the workflow of CLC, turns out that no reference was used. It is a de novo assembly without references. The genome I want to assemble is a kind of protozoon. The data was produced by Illumina Hiseq 2500, so I think there must be some contamination of bacteria or other organism.
May I ask what should I do to deal with the assembled contigs file?
Thank you again!