Hi everyone!
I'm proccesing the sequencing data from an experiment that involved spiking an unknown community with a known and sequenced E. coli. The DNA was short read sequenced and now we would be interested in analysing the community without considering the E. coli that was artificially introduced. I was thinking on running bbduk to get rid of E. coli as follows:
in=reads_1.fq in2=reads_2.fq ref=E_coli_reference.fasta out=filtered_reads.fq
will this do? I was also wondering about bbsplit. I think my concern here is that our reference is not as a single contig but rather as a multifasta file and I'm not certain if there is an advantage of one tool over the other.
Many many thanks!
Hi GenoMax
Thanks for the reply, yes, I'm aware that I will falsely discard some reads but as you say that's related to the short reads and I don't think there's much I can do to avoid this other than sequencing using another approach. I still think this is the best approach I can use at the moment. I think in this case I'll stick to bbduk and keep the risk in mind. Regarding the reference would a multifasta work or would it be better to merge each one of them using NNN to get a fasta file with just one sequence. Thanks on the
outu=
andoutm=
tip by the way