Why can't I perform large genome assembly by Velvet and ABySS?
1
0
Entering edit mode
5.3 years ago
putita.jira ▴ 10

Hi everyone! I'm new in bioinformatics field and I have some problems with de novo assembly. So I would like to ask for suggestions.

General information about my work are...

  • Illumina paired-end, read length 150 bp
  • whole genome sequencing (estimate genome size is 224 million bp)
  • 100x coverage
  • number of reads is around 70 million reads per files (forward and reverse)
  • I used Amazon Web Service EC2, instance type M4xlarge (vCPUs = 4, RAM = 16 GiB) to perform all of the following processes.

After I trimmed reads, I tried to assemble with 2 programs: Velvet and ABySS, but both didn't work.

In case of Velvet, I ran velveth with this code.

velveth /home/ubuntu/velvet21 21 -shortPaired -separate -fastq.gz /home/ubuntu//149-6_1_val_1.fq.gz /home/ubuntu//149-6_2_val_2.fq.gz

and got results like this

[0.000001] Reading FastQ file /home/ubuntu/149-6_1_val_1.fq.gz;
[0.002344] Reading FastQ file /home/ubuntu/149-6_2_val_2.fq.gz;
[924.933978] 139366234 sequences found in total in the paired sequence files
[924.933995] Done
[924.983130] Reading read set file /home/ubuntu/velvet21/Sequences;
[1228.465533] 139366234 sequences found
Killed

However, I tried with much smaller genome (4.8 million bp, 1.4 million read each file) and it worked!

In case of ABySS, I performed with this code.

abyss-pe k=21 name=abyss21 in='149-6_1_val_1.fq.gz 149-6_2_val_2.fq.gz'

The result came up like this...

 ABYSS -k21 -q3    --coverage-hist=coverage.hist -s output21-bubbles.fa  -o output21-1.fa 149-6_1_val_1.fq.gz 149-6_2_val_2.fq.gz
ABySS 2.0.2
ABYSS -k21 -q3 --coverage-hist=coverage.hist -s output21-bubbles.fa -o output21-1.fa 149-6_1_val_1.fq.gz 149-6_2_val_2.fq.gz
Reading `149-6_1_val_1.fq.gz'...
sparsehash FATAL ERROR: failed to allocate 10 groups
/usr/bin/abyss-pe:506: recipe for target 'output21-1.fa' failed
make: *** [output21-1.fa] Error 1

However, again, it ran successfully with a small synthetic data set from this page (ftp://ccb.jhu.edu/pub/dpuiu/Docs/ABYSS.html).

Has it anything to do with RAM? How can I resolve this problem?

Thank you

Putita

assembly software error genome • 2.1k views
ADD COMMENT
0
Entering edit mode

I am not experienced with genome assemblies so the more experienced folks will tell you for sure, but 16GB is pretty much nothing for many bioinformatics tasks. From what I read you need hundreds of GB for de novo assemblies. I would start checking if and from where you can get a cluster/service/node with that amount of memory.

ADD REPLY
0
Entering edit mode

I agree. Boost it up to at least 64GB RAM.

ADD REPLY
1
Entering edit mode

I increased RAM and it works!

Thank you ATpoint and Kevin Blighe for your suggestion :)

ADD REPLY
0
Entering edit mode

Can I know how many memory at the end of your job used?

ADD REPLY
4
Entering edit mode
5.3 years ago
User 59 13k
[1228.465533] 139366234 sequences found
Killed

You have run out of memory. The process is being killed by OOM Killer, and you can probably see this in your syslog. So yes, you need more RAM.

ADD COMMENT
0
Entering edit mode

Thank you Daniel Swan

Now I run it successfully with more RAM :)

ADD REPLY

Login before adding your answer.

Traffic: 2572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6