Greetings,
I have a metagenomics dataset of Illumina overlapping paired-end reads in which the quality deteriorates rapidly. The sequences are already joined. When I filter by sliding window I end up losing 85% of the dataset.
Would separating the sequences, doing trailing-end filtering and then using them as separate files be a viable approach? I'm currently using Diamond which only seems to accept one input file, however.
Any ideas welcome, and thank you for your time.
If you map to a reference, do you see an overlap of paired-ends? What is you insert size distribution (after mapping)?
Hi, I'm not using a reference genome, just blasting against a list of specific protein sequences.
DNA to Protein blast?
Yes, like I said, I'm using Diamond.