Question

Picard - MergeBamAlignment: Reads remaining on alignment iterator

1

Entering edit mode

4.8 years ago

godth13teen ▴ 70

Hi, I'm new to genome analysis and I need some advice on this. I'm currently follow GATK best practice, now I'm stuck at the MergeBamAlignment step of picard. I'm using the data from https://www.internationalgenome.org/, the reference genome is also downloaded from International genome database.

My command:

MergeBamAlignment -ALIGNED aln/HG02142.mapped.ILLUMINA.bwa.KHV.low_coverage.20130415.bam -UNMAPPED /aln/HG02142.unmapped.ILLUMINA.bwa.KHV.low_coverage.20130415.bam -O test.bam -R refgen/hs37d5.fa.gz -TMP_DIR tmp

I got many warning, and didn't get the output file:

WARNING 2020-02-23 21:54:56     SamAlignmentMerger      Exception merging bam alignment - attempting to sort aligned reads and try again: Underlying iterator is not queryname sorted: ERR233223.95759674 2/2 101b aligned to 1:47264561-47264660. > ERR233223.14389540 2/2 101b aligned to 1:47264588-47264671.
INFO    2020-02-23 21:54:56     SamAlignmentMerger      Finished reading 28356 total records from alignment SAM/BAM.
WARNING 2020-02-23 21:54:56     SAMSequenceDictionary   Found sequence entry for which tags differ: 1 and tag UR has the two values: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz and file:/data4t/KHV1000G/refgen/hs37d5.fa.gz. Using ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
WARNING 2020-02-23 21:54:56     SAMSequenceDictionary   Found sequence entry for which tags differ: 2 and tag UR has the two values: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz and file:/data4t/KHV1000G/refgen/hs37d5.fa.gz. Using ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
.... (many warnings in here, similar structure)
[Sun Feb 23 21:54:56 ICT 2020] picard.sam.MergeBamAlignment done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=886571008
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalStateException: Reads remaining on alignment iterator: ERR233223.100003813!
        at picard.sam.AbstractAlignmentMerger.mergeAlignment(AbstractAlignmentMerger.java:558)
        at picard.sam.SamAlignmentMerger.mergeAlignment(SamAlignmentMerger.java:186)
        at picard.sam.MergeBamAlignment.doWork(MergeBamAlignment.java:358)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

What have go wrong in my case? I downloaded all of my files directly from international genomes database ftp, without any modification. Please advice me on this, thank you

alignment genome • 4.0k views

ADD COMMENT • link updated 4.0 years ago by ramshahaya ▴ 10 • written 4.8 years ago by godth13teen ▴ 70

0

Entering edit mode

Please do not delete previous questions that received comments or answers. This is considered disrespectful towards the users who invested time to help you.

ADD REPLY • link 4.8 years ago by ATpoint 85k

0

Entering edit mode

I think you didn't tell us that you post-processed one of the bam using another REF sequence....

file:/data4t/KHV1000G/refgen/hs37d5.fa.gz

ADD REPLY • link 4.8 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

I download both bam files from international genome, the REF sequence you state is just the path on my disk. I didn't process any of those files

ADD REPLY • link 4.8 years ago by godth13teen ▴ 70

0

Entering edit mode

Ah sorry, but where did you get this file refgen/hs37d5.fa.gz ? it's not the same as the one in the BAM. The associated dict file should look like this.

@SQ SN:1    LN:249250621    M5:1b22b98cdeb4a9304cb5d48026a85128 UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz AS:NCBI37   SP:Human
@SQ SN:2    LN:243199373    M5:a0d9851da00400dec1098a9255ac712e UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz AS:NCBI37   SP:Human
(...)
@SQ SN:NC_007605    LN:171823   M5:6743bd63b3ff2b5b8985d8933c53290a UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz AS:NCBI37   SP:Human
@SQ SN:hs37d5   LN:35477943 M5:5b6a4b3a81a2d3c134b7d14bf6ad39f1 UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz AS:NCBI37   SP:Human

ADD REPLY • link 4.8 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

I used samtools to check @SQ tag of the bam files and got this link: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz

I downloaded directly from it

ADD REPLY • link 4.8 years ago by godth13teen ▴ 70

0

Entering edit mode

what is the version of picard ? do you use the latest version ?

ADD REPLY • link 4.8 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

did you create the associated dict file using https://broadinstitute.github.io/picard/command-line-overview.html#CreateSequenceDictionary

ADD REPLY • link 4.8 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

I'm using picard 2.21.8, the latest version I think. I also created the dict using

java -jar picard.jar CreateSequenceDictionary R=refgen/hs37d5.fa.gz O=refgen/hs37d5.dict

ADD REPLY • link 4.8 years ago by godth13teen ▴ 70

0

Entering edit mode

So sorry for that, I didn't state clear enough in that one

ADD REPLY • link 4.8 years ago by godth13teen ▴ 70

score 1 · Answer 1 · 2020-02-23

1

Entering edit mode

4.8 years ago

Pierre Lindenbaum 164k

Re-generate the dict file with CreateSequenceDictionary but use the following values so they match the ones in the BAM

java -jar picard.jar CreateSequenceDictionary R=refgen/hs37d5.fa.gz O=refgen/hs37d5.dict U=ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz SPECIES=Human  GENOME_ASSEMBLY=NCBI37

ADD COMMENT • link 4.8 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Hi, thank you for your answer, I followed that step, this time there is less warning, but it still didn't output anyfile. The full log is here:

09:02:54.970 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/thanhnguyen/picard/build/libs/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Mon Feb 24 09:02:54 ICT 2020] MergeBamAlignment UNMAPPED_BAM=aln/HG02142.unmapped.ILLUMINA.bwa.KHV.low_coverage.20130415.bam ALIGNED_BAM=[aln/HG02142.mapped.ILLUMINA.bwa.KHV.low_coverage.20130415.bam] OUTPUT=HG02142/HG02142_fin.bam TMP_DIR=[tmp] REFERENCE_SEQUENCE=refgen/hs37d5.fa.gz    ADD_PG_TAG_TO_READS=true PAIRED_RUN=true CLIP_ADAPTERS=true IS_BISULFITE_SEQUENCE=false ALIGNED_READS_ONLY=false MAX_INSERTIONS_OR_DELETIONS=1 ATTRIBUTES_TO_REVERSE=[OQ, U2] ATTRIBUTES_TO_REVERSE_COMPLEMENT=[E2, SQ] READ1_TRIM=0 READ2_TRIM=0 ALIGNER_PROPER_PAIR_FLAGS=false SORT_ORDER=coordinate PRIMARY_ALIGNMENT_STRATEGY=BestMapq CLIP_OVERLAPPING_READS=true INCLUDE_SECONDARY_ALIGNMENTS=true ADD_MATE_CIGAR=true UNMAP_CONTAMINANT_READS=false MIN_UNCLIPPED_BASES=32 MATCHING_DICTIONARY_TAGS=[M5, LN] UNMAPPED_READ_STRATEGY=DO_NOT_CHANGE VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Mon Feb 24 09:02:54 ICT 2020] Executing as myuser@abc.com on Linux 3.13.0-147-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.21.8-SNAPSHOT
INFO    2020-02-24 09:02:55     SamAlignmentMerger      Processing SAM file(s): [aln/HG02142.mapped.ILLUMINA.bwa.KHV.low_coverage.20130415.bam]
WARNING 2020-02-24 09:02:55     SamAlignmentMerger      Exception merging bam alignment - attempting to sort aligned reads and try again: Underlying iterator is not queryname sorted: ERR233223.95759674 2/2 101b aligned to 1:47264561-47264660. > ERR233223.14389540 2/2 101b aligned to 1:47264588-47264671.
INFO    2020-02-24 09:02:55     SamAlignmentMerger      Finished reading 28356 total records from alignment SAM/BAM.
[Mon Feb 24 09:02:55 ICT 2020] picard.sam.MergeBamAlignment done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=886571008
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalStateException: Reads remaining on alignment iterator: ERR233223.100003813!
        at picard.sam.AbstractAlignmentMerger.mergeAlignment(AbstractAlignmentMerger.java:558)
        at picard.sam.SamAlignmentMerger.mergeAlignment(SamAlignmentMerger.java:186)
        at picard.sam.MergeBamAlignment.doWork(MergeBamAlignment.java:358)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

ADD REPLY • link 4.8 years ago by godth13teen ▴ 70

0

Entering edit mode

I also tried using another sample, HG02088.mapped.ILLUMINA.bwa.KHV.low_coverage.20130415.bam (still from Human genome project), and it has the same problem, the warning is a bit different in those lines:

WARNING 2020-02-24 09:30:58     SamAlignmentMerger      Exception merging bam alignment - attempting to sort aligned reads and try again: Underlying iterator is not queryname sorted: SRR741416.45399016 2/2 101b aligned to 1:47264572-47264658. > SRR741416.1601628 1/2 101b aligned to 1:47264588-47264688.
INFO    2020-02-24 09:30:58     SamAlignmentMerger      Finished reading 34465 total records from alignment SAM/BAM.
[Mon Feb 24 09:30:58 ICT 2020] picard.sam.MergeBamAlignment done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=886571008
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalStateException: Reads remaining on alignment iterator: SRR741415.10010200!

ADD REPLY • link 4.8 years ago by godth13teen ▴ 70

0

Entering edit mode

~~at this point I think you want MergeSamFiles, NOT MergeBamAlignment Difference Between Picards Mergebamalignment And Mergesamfiles~~

ADD REPLY • link 4.8 years ago by Pierre Lindenbaum 164k

1

Entering edit mode

Underlying iterator is not queryname sorted: SRR741416.45399016 2/2 101b aligned to 1:47264572-47264658. > SRR741416.1601628 1/2 101b aligned to 1:47264588-47264688.

so did you sort the unampped BAM on read-names (not coordinates). Beware, it should be sorted using picard SortSam (NOT samtools) because of : https://github.com/samtools/htsjdk/issues/766

ADD REPLY • link 4.8 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

I downloaded the file from human genome project without any modification. I checked their documents, it seems like they sorted the mapped file:

 samtools view -bSu $sam_file | samtools sort -n -o - samtools_nsort_tmp | 
 samtools fixmate /dev/stdin /dev/stdout | samtools sort -o - samtools_csort_tmp |
 samtools fillmd -u - $reference_fasta > $fixed_bam_file

How can I sort the unmapped bam in the same manner?

ADD REPLY • link 4.8 years ago by godth13teen ▴ 70

0

Entering edit mode

try to use picard SortSam using the correct sorting 'queryname'

ADD REPLY • link 4.8 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

I have tried two cases: 1, Sorting unmapped then MergeAlignment

INFO    2020-02-25 12:13:30     SamAlignmentMerger      Processing SAM file(s): [aln/HG01842.mapped.ILLUMINA.bwa.KHV.low_coverage.20120522.bam]
WARNING 2020-02-25 12:13:30     SAMSequenceDictionary   Found sequence entry for which tags differ: 1 and tag UR has the two values: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz        AS:NCBI37       SP:Human and ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz. Using ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz        AS:NCBI37       SP:Human
(...)
WARNING 2020-02-25 12:13:31     SAMSequenceDictionary   Found sequence entry for which tags differ: hs37d5 and tag UR has the two values: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz        AS:NCBI37       SP:Human and ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz. Using ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz        AS:NCBI37       SP:Human
[Tue Feb 25 12:13:31 ICT 2020] picard.sam.MergeBamAlignment done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=886571008
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalStateException: Reads remaining on alignment iterator: ERR042514.10000846!
        at picard.sam.AbstractAlignmentMerger.mergeAlignment(AbstractAlignmentMerger.java:558)
        at picard.sam.SamAlignmentMerger.mergeAlignment(SamAlignmentMerger.java:186)
        at picard.sam.MergeBamAlignment.doWork(MergeBamAlignment.java:358)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

2, Sorting both files then MergeAlignment

INFO    2020-02-25 12:11:56     SamAlignmentMerger      Processing SAM file(s): [HG            01842.mapped.sorted.bam]
WARNING 2020-02-25 12:11:56     SAMSequenceDictionary   Found sequence entry for which tags differ: 1 and tag UR has the two values: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz                    AS:NCBI37       SP:Human and ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz. Using ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz        AS:NCBI37       SP:Human
(...)
WARNING 2020-02-25 12:11:57     SAMSequenceDictionary   Found sequence entry for which tags differ: hs37d5 and tag UR has the two values: ftp://ftp.1000genomes.ebi.ac            .uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz                    AS:NCBI37       SP:Human and ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz. Using ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz        AS:NCBI37       SP:Human
[Tue Feb 25 12:11:57 ICT 2020] picard.sam.MergeBamAlignment done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=886571008
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalStateException: Reads remaining on alignment iterator: ERR042514.10000846!
        at picard.sam.AbstractAlignmentMerger.mergeAlignment(AbstractAlignmentMerger.java:558)
        at picard.sam.SamAlignmentMerger.mergeAlignment(SamAlignmentMerger.java:186)
        at picard.sam.MergeBamAlignment.doWork(MergeBamAlignment.java:358)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

And it still output nothing

ADD REPLY • link 4.8 years ago by godth13teen ▴ 70

0

Entering edit mode

no it's not, as you can see I got one mapped file and one unmapped file, I need the MergeBamAlignment not the Mergesamfiles

ADD REPLY • link 4.8 years ago by godth13teen ▴ 70

0

Entering edit mode

oh, I'm wrong again sorry.

ADD REPLY • link 4.8 years ago by Pierre Lindenbaum 164k

score 0 · Answer 2 · 2020-12-07

0

Entering edit mode

4.0 years ago

ramshahaya ▴ 10

Hi,

How did you create unmapped file for MergeBamAlignment?

ADD COMMENT • link 4.0 years ago by ramshahaya ▴ 10