Hi everyone,
I have two paired-end fastq files from MiSeq platform with sequences of V3-V4 region of 16S rRNA and I want to cluster these samples. Following mothur SOP, I used next commands:
make.file(inputdir=., type=fastq, prefix=stability)
make.contigs(file=stability.files, processors=4)
summary.seqs(fasta=stability.trim.contigs.fasta)
screen.seqs(fasta=stability.trim.contigs.fasta, group=stability.contigs.groups, maxlength=570, minlength=400)
unique.seqs(fasta=stability.trim.contigs.good.fasta)
count.seqs(name=stability.trim.contigs.good.names, group=stability.contigs.good.groups)
align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=silva.bacteria.fasta)
screen.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table, summary=stability.trim.contigs.good.unique.summary, start=6388, end=25316)
filter.seqs(fasta=stability.trim.contigs.good.unique.good.align, vertical=T, trump=.)
unique.seqs(fasta=stability.trim.contigs.good.unique.good.filter.fasta, count=stability.trim.contigs.good.good.count_table)
dist.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.fasta, cutoff=0.20)
I omitted some steps (chimera search, undesirables remove) consciously because they are not important for me now. But I am having problems with dist.seqs command - it is just producing several files with size > 40 Gb, until I am out of my hard drive space. What am I doing wrong?
Any help would be appreciated!
Found an answer myself: http://blog.mothur.org/2014/09/11/Why-such-a-large-distance-matrix/?/