How to reduce database size for USEARCH
1
0
Entering edit mode
20 months ago
lyehui1 • 0

Hello,

I am new to bioinformatics and trying to replicate a 16S rRNA analysis study. I am currently stuck at a section where the authors used UCHIME v4.2 to identify and remove chimeric sequences.

I have the following files intended as input:

  • Reference database file (.fasta, ~800Mb) downloaded from SILVA
  • Reads in the form of fastq.gz files (ranging from 14-18Mb)

I am using the 32-bit version of USEARCH, and there isn't enough memory to run chimera identification for any of the files.

Some of the recommended ways to reduce memory mentioned on the USEARCH website include reducing database size by clustering or splitting. How do I start going about doing this? Or are there other potential issues with my input files?

USEARCH • 729 views
ADD COMMENT
1
Entering edit mode

Clustering or removing redundancy can be done using CD-HIT (LINK). Start there.

If you are trying to replicate a study doing something like this (if it was not done in the original study) is bound to lead to you not being able to reproduce the original results.

ADD REPLY
1
Entering edit mode
20 months ago
Darked89 4.7k

You can use a free 64bit usearch reimplementation: https://github.com/torognes/vsearch

Depending how many duplicated reads you have in your FASTQs, it may be worthy to collapse identical reads using clumpify from BBMap.

ADD COMMENT

Login before adding your answer.

Traffic: 2727 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6