Question

BWA. Allocate memory

0

Entering edit mode

6.4 years ago

windsur ▴ 20

Dear all,

I have changed the work station and now I am using Centos 7 as operating system.

And if I write this I found that I think I do not have such memory as run a pipeline (in python) to analyse an exome-seq

> free -h
              total        used        free      shared  buff/cache   available
Mem:           7,5G        1,9G        665M        435M        5,0G        4,9G
Swap:          1,9G        1,1G        816M

I've tried with a little sample but when I start mapping the fastq files (using BWA), I reach that:

---------------------------------------------------------------------------------------
                               Mapping fastq files (BWA)                                      
---------------------------------------------------------------------------------------
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[bwa_idx2mem] Failed to allocate 4705743064 bytes at bwa.c line 346: Cannot allocate memory

So my question is how much memory do you recommend me to have, or if there is another way.

thanks!

P.s. I am talking about human samples, using hg19 as genome ref.

next-gen sequencing alignment bwa memory • 4.9k views

ADD COMMENT • link 6.4 years ago by windsur ▴ 20

0

Entering edit mode

You are really close to the memory limit with your 5G available to map reads against human genome using BWA. What are your command lines to index your reference genome and to align your reads ?

You can take a look to HISAT which has AFAIK the lowest memory requirement.

ADD REPLY • link 6.4 years ago by Bastien Hervé 5.9k

0

Entering edit mode

Thanks! I will try to add more memory and also what you said.

what I use is:

call('bwa mem -t' + str(args.threads) + ' -R "@RG\tID:' + sample_name + '\tLB:library\tPL:illumina\tPU:library\tSM:' + sample_name + '" ' + genome_ref + ' ' + forward_paths[i] + ' ' + reverse_paths[i] + ' > ' + sample_path + '/' + sample_name + '_bwa.sam',shell = True)

after that I take the sam file and using samtools I create the sorted bam file:

call("find " + sample_path +  "*.sam | parallel --no-notice -j" + str(args.parallelization) + " 'samtools sort {} -O BAM -@ " + str(args.threads / 2) + " -o {}_sorted.bam && samtools index {}_sorted.bam'", shell = True)

ADD REPLY • link 6.4 years ago by windsur ▴ 20

0

Entering edit mode

Did you successfully index your genome_ref ?

In your python code try to print your call command and copy/paste them here, it's really hard to investigate with all these variables

ADD REPLY • link 6.4 years ago by Bastien Hervé 5.9k

0

Entering edit mode

For exome sequencing, better stay with BWA mem, as it is the standard aligner for many downstream tools, including most variant and SV callers.

ADD REPLY • link 6.4 years ago by ATpoint 85k

0

Entering edit mode

True, that was just a memory test first, and if OP has no other choice HISAT would have be one solution

ADD REPLY • link 6.4 years ago by Bastien Hervé 5.9k

0

Entering edit mode

Thanks Bastian, I will try with HISAT too. And to index my reference I have followed the steps of GATK

ADD REPLY • link 6.4 years ago by windsur ▴ 20

0

Entering edit mode

The question about your index was, was it sucessfull ?

ADD REPLY • link 6.4 years ago by Bastien Hervé 5.9k

0

Entering edit mode

hey Bastian, yes it was sucessfull

ADD REPLY • link 6.4 years ago by windsur ▴ 20

0

Entering edit mode

Exactly! after using samtools I unload the genome reference:

call('bwa shm -d',shell = True)

Or should I not use samtools for the analysis you mean?

ADD REPLY • link 6.4 years ago by windsur ▴ 20