Picard MarkDuplicates shows error (removing pcr duplicates)
0
0
Entering edit mode
6.0 years ago
ashaneev07 ▴ 40

Hiii.... i got the following error while running picards markduplicates. Does anyone have any experience with using this command in picard? Need help..

> java -jar picard.jar MarkDuplicates  I=300BP.sorted O=marked_duplicates_300.bam M=marked_dup_metrics.txt  REMOVE_DUPLICATES=true &

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    MarkDuplicates -I /300BP.sorted -O marked_duplicates_300.bam -M marked_dup_metrics.txt -REMOVE_DUPLICATES true
**********


15:13:11.216 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/home/Documents/Tools_NGS/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Nov 21 15:13:11 IST 2018] MarkDuplicates INPUT=[300BP.sorted] OUTPUT=marked_duplicates_300.bam METRICS_FILE=marked_dup_metrics.txt REMOVE_DUPLICATES=true    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true ADD_PG_TAG_TO_READS=true ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture="" of="" last="" three="" ':'="" separated="" fields="" as="" numeric="" values=""> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Nov 21 15:13:11 IST 2018] Executing as home@home-Lenovo-H30-50 on Linux 4.4.0-31-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_171-b11; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.18.14-SNAPSHOT
INFO    2018-11-21 15:13:11 MarkDuplicates  Start of doWork freeMemory: 240890984; totalMemory: 251658240; maxMemory: 3720871936
INFO    2018-11-21 15:13:11 MarkDuplicates  Reading input file and constructing read end information.
INFO    2018-11-21 15:13:11 MarkDuplicates  Will retain up to 13481420 data points before spilling to disk.
[Wed Nov 21 15:13:13 IST 2018] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.04 minutes.
Runtime.totalMemory()=1302331392
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: /tmp/home/CSPI.8946166571745516868.tmp/20922.tmpnot found
    at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:64)
    at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:49)
    at htsjdk.samtools.util.ResourceLimitedMap.get(ResourceLimitedMap.java:76)
    at htsjdk.samtools.CoordinateSortedPairInfoMap.getOutputStreamForSequence(CoordinateSortedPairInfoMap.java:180)
    at htsjdk.samtools.CoordinateSortedPairInfoMap.put(CoordinateSortedPairInfoMap.java:164)
    at picard.sam.markduplicates.util.DiskBasedReadEndsForMarkDuplicatesMap.put(DiskBasedReadEndsForMarkDuplicatesMap.java:65)
    at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:543)
    at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:232)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
Caused by: java.io.FileNotFoundException: /tmp/home/CSPI.8946166571745516868.tmp/20922.tmp (Too many open files)
    at java.io.FileOutputStream.open0(Native Method)
    at java.io.FileOutputStream.open(FileOutputStream.java:270)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
    at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:61)
... 10 more
snp alignment sequence • 4.1k views
ADD COMMENT
0
Entering edit mode

can you try with

java -Djava.io.tmpdir=. -jar picard.jar -I 300BP.sorted (etc...)
ADD REPLY
0
Entering edit mode

I tried with as u mentioned and now it shows like

I=300BP.sorted.bam' is not a valid command

ADD REPLY
0
Entering edit mode

sorry I forgot the main jar command after the jar...

java -Djava.io.tmpdir=. -jar picard.jar MarkDuplicates -I 300BP.sorted (etc...)
ADD REPLY
0
Entering edit mode
NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    MarkDuplicates -I 300BP.sorted -O marked_duplicates_300.bam -M marked_dup_metrics.txt -REMOVE_DUPLICATES true
**********


16:18:14.987 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/veena/Documents/Tools_NGS/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Nov 21 16:18:15 IST 2018] MarkDuplicates INPUT=[300BP.sorted] OUTPUT=marked_duplicates_300.bam METRICS_FILE=marked_dup_metrics.txt REMOVE_DUPLICATES=true    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true ADD_PG_TAG_TO_READS=true ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture="" of="" last="" three="" ':'="" separated="" fields="" as="" numeric="" values=""> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Nov 21 16:18:15 IST 2018] Executing as veena@veena-Lenovo-H30-50 on Linux 4.4.0-31-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_171-b11; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.18.14-SNAPSHOT
[Wed Nov 21 16:18:15 IST 2018] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=251658240
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: Cannot read non-existent file: file:///home/veena/Documents/Tools_NGS/300BP.sorted
    at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:430)
    at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:417)
    at htsjdk.samtools.util.IOUtil.assertInputIsValid(IOUtil.java:393)
    at htsjdk.samtools.util.IOUtil.assertInputsAreValid(IOUtil.java:469)
    at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:224)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
ADD REPLY
0
Entering edit mode

may be not the main problem but the extension of 300BP.sorted should be 'sam'.

ADD REPLY
0
Entering edit mode

what is the outpout of

file 300BP.sorted

?

ADD REPLY
0
Entering edit mode

300BP.sorted file is a sorted bam file.

ADD REPLY
0
Entering edit mode

300BP.sorted file is a sorted bam file.

this is not the output of the command 'file'

ADD REPLY
0
Entering edit mode

Could u please explain the meaning of your previous statement more fully. i didn't get any output file from the above command.

ADD REPLY
0
Entering edit mode

file <filename> prints out information about the filetyp. For a valid bam file you should get the following message in your terminal:

$ file input.bam
input.bam: gzip compressed data, extra field

fin swimmer

ADD REPLY
0
Entering edit mode

ya..i got it.

$ file 300BP.sorted.bam 300BP.sorted.bam: gzip compressed data, extra field

ADD REPLY
0
Entering edit mode
Caused by: java.io.FileNotFoundException: /tmp/home/CSPI.8946166571745516868.tmp/20922.tmp (Too many open files)

This looks like MarkDuplicates needs to create many temporary files and also needs to keep them open. Most distribution have a limit of 1024 by default. You can check this with ulimit -n. For the current shell you can set it to higher number by e.g. ulimit -n 2048.

fin swimmer

ADD REPLY
0
Entering edit mode

Got like this...

ulimit -n 1024

ulimit -n 2048

bash: ulimit: open files: cannot modify limit: Operation not permitted

ADD REPLY
1
Entering edit mode

There seems to be a lot of reasons and solution why this message appears. As I don't know your system I would recommend searching the web for this error message, to find a solution to increase the limit on your system.

fin swimmer

ADD REPLY

Login before adding your answer.

Traffic: 2100 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6