Entering edit mode
6.9 years ago
MAPK
★
2.1k
I have a fastq file that seems to be contaminated by some sequences contaminating my reagents during library preparation. If I know the reads that came from reagents and I have them in a fasta format, do you think I can eliminate those reads from my fastq file? I want to remove any reads contaminating my fastq file. How can I work this out?
I think you forgot to include a link to the program that does this.
The OP asked "How can I work this out?". The above comment illustrates a pipeline in which to complete the OP's task, so not one single program.
Several posts on Biostars are available for reference in parsing out sequences from FASTA files based on BLAST results
I was unaware, until your answer @genomax, that the BBMap suite had this option.
The way you wrote that made it seem like you had copied/pasted that from github description of a package :-)
BTW: @Brian includes a
sequencing_artifacts.fa.gz
file (inresources
directory) that I assume includes contaminants (which may be seen at other places but I assume are seen at JGI).Various things that BBMap suite can do are here, if you have not seen this post before.
Trying to avoid black box/turn key solutions, so one can learn in the process.
Hi did you find the solution for that? If my contamination reads and true reads both are in fastq file then how to remove those reads ?
I gave you two additional answers in other thread you posted this in: C: Subtracting one FASTAq file Reads from other FASTAq reads