Hi there,
I used SPADE to assemble my metagenome shotgun dataset into contigs. I just realized, however, that there is human contamination in this assembly. Because of how long the assembly took, I'm trying to think of ways to remove those human contigs from the FASTA assembly. Any suggestions? Now, if I need to go back a step, and remove them from the FASTQ files, how should I proceed? (I'd rather not use something like Kneaddata from removal of human contaminations btw.)
thanks!
You could simply align the data to human genome (use
blat
,LAST
orLASTZ
) and remove sequences that align.If you are willing to go back to the original data then try: http://seqanswers.com/forums/showthread.php?t=42552
BlobTools is great for this, although if you have too many contigs (hundred thousands or millions of contigs) the blast step may be too slow.
In addition to good suggestions that are already part of this thread, I think you should look at all similar posts on the far right side of this page. This is a fairly common problem and has been debated already.
You may want to consider binning of your sequences with t-SNE or UMAP. Human contigs that are > 5kb should separate easily from other sequences.