Hi all,
I'm completely new to metagenomics field, so I apologize in advance if this is a very trivial and confusing post. I have some millions of Illumina HiSeq reads generated from deep sea water. They are paired end reads, but since the quality of the reverse reads are not so good, I'm planning to work only with the forward reads at the beginning. I'm particularly interested in analyzing the eukaryotic sequences using, for example qiime. However, since the files are too huge (around 60GB each sample) , I'm not able to run even the fastest strategy in qiime (ucrss_fast_O29_r97) for OTU picking (More than 10 hours running using 320GB of mem and 29 cpus, and it didn't finish even the first step). I've tried running the same qiime commands for a small part of one sample (around 1.3GB) and it worked very well, so I guess the problem is the size of the file rather than a mistake in the commands. I've tried to use Kraken to separate the prokaryotic from the eukaryotic data, but it did not work (most of the reads - 96% - were unclassified, and I know this is not true because of previous tests). I would like to have some suggestions for separating the eukarya from prokarya data so that I could proceed with the OTU picking in qiime only with the eukaryotes. Any suggestion will be deeply appreciated.
Thank you!