Question

Picard MarkDuplicates shows error (removing pcr duplicates)

0

Entering edit mode

6.0 years ago

ashaneev07 ▴ 40

Hiii.... i got the following error while running picards markduplicates. Does anyone have any experience with using this command in picard? Need help..

> java -jar picard.jar MarkDuplicates  I=300BP.sorted O=marked_duplicates_300.bam M=marked_dup_metrics.txt  REMOVE_DUPLICATES=true &

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    MarkDuplicates -I /300BP.sorted -O marked_duplicates_300.bam -M marked_dup_metrics.txt -REMOVE_DUPLICATES true
**********


15:13:11.216 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/home/Documents/Tools_NGS/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Nov 21 15:13:11 IST 2018] MarkDuplicates INPUT=[300BP.sorted] OUTPUT=marked_duplicates_300.bam METRICS_FILE=marked_dup_metrics.txt REMOVE_DUPLICATES=true    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true ADD_PG_TAG_TO_READS=true ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture="" of="" last="" three="" ':'="" separated="" fields="" as="" numeric="" values=""> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Nov 21 15:13:11 IST 2018] Executing as home@home-Lenovo-H30-50 on Linux 4.4.0-31-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_171-b11; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.18.14-SNAPSHOT
INFO    2018-11-21 15:13:11 MarkDuplicates  Start of doWork freeMemory: 240890984; totalMemory: 251658240; maxMemory: 3720871936
INFO    2018-11-21 15:13:11 MarkDuplicates  Reading input file and constructing read end information.
INFO    2018-11-21 15:13:11 MarkDuplicates  Will retain up to 13481420 data points before spilling to disk.
[Wed Nov 21 15:13:13 IST 2018] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.04 minutes.
Runtime.totalMemory()=1302331392
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: /tmp/home/CSPI.8946166571745516868.tmp/20922.tmpnot found
    at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:64)
    at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:49)
    at htsjdk.samtools.util.ResourceLimitedMap.get(ResourceLimitedMap.java:76)
    at htsjdk.samtools.CoordinateSortedPairInfoMap.getOutputStreamForSequence(CoordinateSortedPairInfoMap.java:180)
    at htsjdk.samtools.CoordinateSortedPairInfoMap.put(CoordinateSortedPairInfoMap.java:164)
    at picard.sam.markduplicates.util.DiskBasedReadEndsForMarkDuplicatesMap.put(DiskBasedReadEndsForMarkDuplicatesMap.java:65)
    at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:543)
    at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:232)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
Caused by: java.io.FileNotFoundException: /tmp/home/CSPI.8946166571745516868.tmp/20922.tmp (Too many open files)
    at java.io.FileOutputStream.open0(Native Method)
    at java.io.FileOutputStream.open(FileOutputStream.java:270)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
    at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:61)
... 10 more

snp alignment sequence • 4.1k views

ADD COMMENT • link updated 6.0 years ago by Kevin Blighe 88k • written 6.0 years ago by ashaneev07 ▴ 40

0

Entering edit mode

can you try with

java -Djava.io.tmpdir=. -jar picard.jar -I 300BP.sorted (etc...)

ADD REPLY • link 6.0 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

I tried with as u mentioned and now it shows like

I=300BP.sorted.bam' is not a valid command

ADD REPLY • link 6.0 years ago by ashaneev07 ▴ 40

0

Entering edit mode

sorry I forgot the main jar command after the jar...

java -Djava.io.tmpdir=. -jar picard.jar MarkDuplicates -I 300BP.sorted (etc...)

ADD REPLY • link 6.0 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    MarkDuplicates -I 300BP.sorted -O marked_duplicates_300.bam -M marked_dup_metrics.txt -REMOVE_DUPLICATES true
**********


16:18:14.987 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/veena/Documents/Tools_NGS/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Nov 21 16:18:15 IST 2018] MarkDuplicates INPUT=[300BP.sorted] OUTPUT=marked_duplicates_300.bam METRICS_FILE=marked_dup_metrics.txt REMOVE_DUPLICATES=true    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true ADD_PG_TAG_TO_READS=true ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture="" of="" last="" three="" ':'="" separated="" fields="" as="" numeric="" values=""> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Nov 21 16:18:15 IST 2018] Executing as veena@veena-Lenovo-H30-50 on Linux 4.4.0-31-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_171-b11; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.18.14-SNAPSHOT
[Wed Nov 21 16:18:15 IST 2018] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=251658240
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: Cannot read non-existent file: file:///home/veena/Documents/Tools_NGS/300BP.sorted
    at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:430)
    at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:417)
    at htsjdk.samtools.util.IOUtil.assertInputIsValid(IOUtil.java:393)
    at htsjdk.samtools.util.IOUtil.assertInputsAreValid(IOUtil.java:469)
    at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:224)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

ADD REPLY • link updated 6.0 years ago by finswimmer 16k • written 6.0 years ago by ashaneev07 ▴ 40

0

Entering edit mode

may be not the main problem but the extension of 300BP.sorted should be 'sam'.

ADD REPLY • link 6.0 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

what is the outpout of

file 300BP.sorted

?

ADD REPLY • link 6.0 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

300BP.sorted file is a sorted bam file.

ADD REPLY • link 6.0 years ago by ashaneev07 ▴ 40

0

Entering edit mode

300BP.sorted file is a sorted bam file.

this is not the output of the command 'file'

ADD REPLY • link 6.0 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Could u please explain the meaning of your previous statement more fully. i didn't get any output file from the above command.

ADD REPLY • link 6.0 years ago by ashaneev07 ▴ 40

0

Entering edit mode

file <filename> prints out information about the filetyp. For a valid bam file you should get the following message in your terminal:

$ file input.bam
input.bam: gzip compressed data, extra field

fin swimmer

ADD REPLY • link 6.0 years ago by finswimmer 16k

0

Entering edit mode

ya..i got it.

$ file 300BP.sorted.bam 300BP.sorted.bam: gzip compressed data, extra field

ADD REPLY • link 6.0 years ago by ashaneev07 ▴ 40

0

Entering edit mode

Caused by: java.io.FileNotFoundException: /tmp/home/CSPI.8946166571745516868.tmp/20922.tmp (Too many open files)

This looks like MarkDuplicates needs to create many temporary files and also needs to keep them open. Most distribution have a limit of 1024 by default. You can check this with ulimit -n. For the current shell you can set it to higher number by e.g. ulimit -n 2048.

fin swimmer

ADD REPLY • link 6.0 years ago by finswimmer 16k

0

Entering edit mode

Got like this...

ulimit -n 1024

ulimit -n 2048

bash: ulimit: open files: cannot modify limit: Operation not permitted

ADD REPLY • link 6.0 years ago by ashaneev07 ▴ 40

1

Entering edit mode

There seems to be a lot of reasons and solution why this message appears. As I don't know your system I would recommend searching the web for this error message, to find a solution to increase the limit on your system.

fin swimmer

ADD REPLY • link 6.0 years ago by finswimmer 16k