Hi,
I'm running into an issue with markduplicates. I am working with large fastq files and when running markduplicates there is an out of memory error. It doesn't look like there is enough storage on any of the nodes I have been trying to run markduplicates on. I have tried running it on a node with 1tb of memory. I'm not sure how much memory is available in --TMP_DIR /dev/shm/
because it varies. I was wondering if there was a way to fix this issue?
MarkDuplicates --INPUT D01882/sample1.bam --OUTPUT sample1_marked.bam --METRICS_FILE sample1_metrics --ASSUME_SORT_ORDER queryname --OPTICAL_DUPLICATE_PIXEL_DISTANCE 2500 --TMP_DIR /dev/shm/sample1.md.tmp --VALIDATION_STRINGENCY SILENT --CREATE_MD5_FILE true --MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP 50000 --MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 8000 --SORTING_COLLECTION_SIZE_RATIO 0.25 --TAG_DUPLICATE_SET_MEMBERS false --REMOVE_SEQUENCING_DUPLICATES false --TAGGING_POLICY DontTag --CLEAR_DT true --DUPLEX_UMI false --ADD_PG_TAG_TO_READS true --REMOVE_DUPLICATES false --ASSUME_SORTED false --DUPLICATE_SCORING_STRATEGY SUM_OF_BASE_QUALITIES --PROGRAM_RECORD_ID MarkDuplicates --PROGRAM_GROUP_NAME MarkDuplicates --READ_NAME_REGEX <optimized capture of last three ':' separated fields as numeric values> --MAX_OPTICAL_DUPLICATE_SET_SIZE 300000 --VERBOSITY INFO --QUIET false --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
wont that increase the run time?
If current option is not working then there is no choice. Provided you have a performant file system the hit should not be too bad.