Entering edit mode
5.5 years ago
Cecelia
▴
30
Hi, I was running picard markduplicates with a few bam files.
java -jar /sw/bioinfo/picard/2.20.4/rackham/picard.jar MarkDuplicates INPUT=sorted.bam OUTPUT=md.bam METRICS_FILE=duplicate.txt READ_NAME_REGEX=null REMOVE_DUPLICATES=true CREATE_INDEX=true
And I got no output file and message like this:
INFO 2019-11-20 02:53:04 MarkDuplicates Start of doWork freeMemory: 2037715440; totalMemory: 2058354688; maxMemory: 28631367680
INFO 2019-11-20 02:53:04 MarkDuplicates Reading input file and constructing read end information.
INFO 2019-11-20 02:53:04 MarkDuplicates Will retain up to 103736839 data points before spilling to disk.
[Wed Nov 20 02:53:05 CET 2019] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=2058354688
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: Sequence name 'scaffold1,8899378,f8056Z8899378' doesn't match regex: '[0-9A-Za-z!#$%&+./:;?@^_|~-][0-9A-Za-z!#$%&*+./:;=?@^_|~-]*'
at htsjdk.samtools.SAMSequenceRecord.validateSequenceName(SAMSequenceRecord.java:211)
at htsjdk.samtools.SAMSequenceRecord.<init>(SAMSequenceRecord.java:94)
at htsjdk.samtools.SAMTextHeaderCodec.parseSQLine(SAMTextHeaderCodec.java:224)
at htsjdk.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:114)
at htsjdk.samtools.BAMFileReader.readHeader(BAMFileReader.java:704)
at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:298)
at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:176)
at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:396)
at picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram.openInputs(AbstractMarkDuplicatesCommandLineProgram.java:220)
at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:528)
at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:257)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
The reference genome is the output from LINKS v1.8.7. The header of each contig should look like this: (scalfold,size,contig infomation)
>scaffold1,8899378,f8056Z8899378
>scaffold2,7251368,f8058Z7239915k15a0.13m100_f1079z11453
>scaffold3,6336565,f8055Z6291785k21a0.14m100_r10570z44780
If it is the problem with the header, how should I change it without losing information?
I read in this post that setting READ_NAME_REGEX=null could solve the problem but did not work in my case.
https://sourceforge.net/p/samtools/mailman/message/32614448/
Any comments or suggestion will be appreciated.
Try adding a comma to the second part of the regex (full regex would be
'[0-9A-Za-z!#$%&+./:;?@^_|~-][0-9A-Za-z!#$%&*+./:;=?@^_|~,-]*'
and using that as theREAD_NAME_REGEX