Question

GATK markduplicates out of memory

3

Entering edit mode

3.7 years ago

williamsbrian5064 ▴ 530

Hi,

I'm running into an issue with markduplicates. I am working with large fastq files and when running markduplicates there is an out of memory error. It doesn't look like there is enough storage on any of the nodes I have been trying to run markduplicates on. I have tried running it on a node with 1tb of memory. I'm not sure how much memory is available in --TMP_DIR /dev/shm/ because it varies. I was wondering if there was a way to fix this issue?

MarkDuplicates --INPUT D01882/sample1.bam  --OUTPUT sample1_marked.bam --METRICS_FILE sample1_metrics --ASSUME_SORT_ORDER queryname --OPTICAL_DUPLICATE_PIXEL_DISTANCE 2500 --TMP_DIR /dev/shm/sample1.md.tmp --VALIDATION_STRINGENCY SILENT --CREATE_MD5_FILE true --MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP 50000 --MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 8000 --SORTING_COLLECTION_SIZE_RATIO 0.25 --TAG_DUPLICATE_SET_MEMBERS false --REMOVE_SEQUENCING_DUPLICATES false --TAGGING_POLICY DontTag --CLEAR_DT true --DUPLEX_UMI false --ADD_PG_TAG_TO_READS true --REMOVE_DUPLICATES false --ASSUME_SORTED false --DUPLICATE_SCORING_STRATEGY SUM_OF_BASE_QUALITIES --PROGRAM_RECORD_ID MarkDuplicates --PROGRAM_GROUP_NAME MarkDuplicates --READ_NAME_REGEX <optimized capture of last three ':' separated fields as numeric values> --MAX_OPTICAL_DUPLICATE_SET_SIZE 300000 --VERBOSITY INFO --QUIET false --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false

Assembly WGS MarkDuplicates GATK • 2.8k views

ADD COMMENT • link updated 3.7 years ago by Pierre Lindenbaum 164k • written 3.7 years ago by williamsbrian5064 ▴ 530

score 1 · Answer 1 · 2021-04-27

1

Entering edit mode

3.7 years ago

Pierre Lindenbaum 164k

gatk is a java application. The memory of a java app has a maximum default. https://stackoverflow.com/questions/4667483

you can extend the memory using: -Xmx

gatk --java-options "-Xmx5g -Djava.io.tmpdir=."  MarkDuplicates ...

ADD COMMENT • link 3.7 years ago by Pierre Lindenbaum 164k

score 0 · Answer 2 · 2021-04-27

0

Entering edit mode

3.7 years ago

GenoMax 148k

Set --TMP_DIR to a real directory location where you are able to write in your command line.

ADD COMMENT • link 3.7 years ago by GenoMax 148k

0

Entering edit mode

wont that increase the run time?

ADD REPLY • link 3.7 years ago by williamsbrian5064 ▴ 530

0

Entering edit mode

If current option is not working then there is no choice. Provided you have a performant file system the hit should not be too bad.

ADD REPLY • link 3.7 years ago by GenoMax 148k