Problem with Picard MarkDuplicates
1
0
Entering edit mode
5.1 years ago
Ming ▴ 110

Dear All,

I am trying to use the Picard MarkDuplicate tools, but came across the following error:

(base) tanshiming@S620100019205:~/Downloads/sorted-bam$ java -jar /home/tanshiming/tools/picard-2.21.1/picard.jar MarkDuplicates M=1B131118A.marked_dup_metrics.txt REMOVE_DUPLICATES=true I=1B131118B.sorted.bam  O=1B131118B.bwa.dedupe.bam
INFO    2019-10-15 16:43:25 MarkDuplicates  

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    MarkDuplicates -M 1B131118A.marked_dup_metrics.txt -REMOVE_DUPLICATES true -I 1B131118B.sorted.bam -O 1B131118B.bwa.dedupe.bam
**********


16:43:25.875 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/tanshiming/tools/picard-2.21.1/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Tue Oct 15 16:43:25 SGT 2019] MarkDuplicates INPUT=[1B131118B.sorted.bam] OUTPUT=1B131118B.bwa.dedupe.bam METRICS_FILE=1B131118A.marked_dup_metrics.txt REMOVE_DUPLICATES=true    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture="" of="" last="" three="" ':'="" separated="" fields="" as="" numeric="" values=""> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Tue Oct 15 16:43:25 SGT 2019] Executing as tanshiming@S620100019205 on Linux 4.15.0-58-generic amd64; Java HotSpot(TM) 64-Bit Server VM 12.0.2+10; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.21.1-SNAPSHOT
INFO    2019-10-15 16:43:25 MarkDuplicates  Start of doWork freeMemory: 158532400; totalMemory: 167772160; maxMemory: 32178700288
INFO    2019-10-15 16:43:25 MarkDuplicates  Reading input file and constructing read end information.
INFO    2019-10-15 16:43:25 MarkDuplicates  Will retain up to 116589493 data points before spilling to disk.
[Tue Oct 15 16:45:48 SGT 2019] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 2.38 minutes.
Runtime.totalMemory()=17448304640
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMFormatException: Number of sequences in text header (34216887) != number of sequences in binary header (182084669) for file /home/tanshiming/Downloads/sorted-bam/1B131118B.sorted.bam
    at htsjdk.samtools.BAMFileReader.readHeader(BAMFileReader.java:712)
    at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:298)
    at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:176)
    at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:396)
    at picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram.openInputs(AbstractMarkDuplicatesCommandLineProgram.java:220)
    at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:533)
    at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:257)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

Appreciate it if I could get any advice on this! Thank You.

mapping • 2.6k views
ADD COMMENT
1
Entering edit mode

Hi Ming, It seems there is something wrong with your bam file header. Can you post the code on how you get the input bam file (from mapping output to your input)?

ADD REPLY
0
Entering edit mode

there is a difference between the number of chromosomes declared in the SAM header and it's BAM. How was built 1B131118B.sorted.bam ?

ADD REPLY
0
Entering edit mode

Dear All,

Thanks for the reply. Below are the commands on how I got the bam file:

CONTIGS=contigs.fa 
REF=contigs-bowtie2-reference

# Build reference db 
bowtie2-build $CONTIGS $REF

# Mapping of reads 
for x in *_R1_001.fastq.gz
    do bowtie2 -p 50 -x $REF -1 $x -2 ${x%_R1_001*}_R2_001.fastq.gz -S ${x%_*_*_001*}.sam
done

# Convert sam to bam and sort
for x in *.sam
do samtools view -bS $x | samtools sort - -o ${x/.sam/.sorted.bam}
done

Thanks!

ADD REPLY
0
Entering edit mode

I haven't seen anything in your code could be the problem, have you compared the header of your sam file with that of your sorted bam file? Are they the same?

ADD REPLY
0
Entering edit mode

Hello @yztxwd,

How do I go about doing this?

Thanks

ADD REPLY
0
Entering edit mode

samtools -H can show you the header You can also follow the suggestions by @prishly, it implies you can use pybam to check the text header and binary header of bam file, sometimes simply reinstalling or updating your samtools can solve the problem

ADD REPLY
0
Entering edit mode

Dear @yztxwd,

Should I just inspect the text header of the bam and the sam files? I am guessing the header should match? Will a conda-installed version of Samtools be fine?

Thank you.

ADD REPLY
0
Entering edit mode

Dear All,

I have tried using the samtools version 1.9 (latest version), but I am still facing the same issue. Appreciate any help that I can get!

Thank you.

ADD REPLY
0
Entering edit mode

Likely not solving your problem, but the line below is more complicated than necessary:

samtools view -bS $x | samtools sort - -o ${x/.sam/.sorted.bam}

You could just use samtools sort directly:

samtools sort $x -o ${x/.sam/.sorted.bam}

Likewise, you could avoid the intermediate sam file and pipe directly to samtools sort to get a sorted bam:

bowtie2 -p 50 -x $REF -1 $x -2 ${x%_R1_001*}_R2_001.fastq.gz | samtools sort -o ${x%_*_*_001*}.bam
ADD REPLY
1
Entering edit mode
5.1 years ago
prishly ▴ 10

Take a look at this post. Gurus recommend updating samtools

ADD COMMENT

Login before adding your answer.

Traffic: 1637 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6