Picard - MergeBamAlignment: Reads remaining on alignment iterator
2
1
Entering edit mode
4.8 years ago
godth13teen ▴ 70

Hi, I'm new to genome analysis and I need some advice on this. I'm currently follow GATK best practice, now I'm stuck at the MergeBamAlignment step of picard. I'm using the data from https://www.internationalgenome.org/, the reference genome is also downloaded from International genome database.

My command:

MergeBamAlignment -ALIGNED aln/HG02142.mapped.ILLUMINA.bwa.KHV.low_coverage.20130415.bam -UNMAPPED /aln/HG02142.unmapped.ILLUMINA.bwa.KHV.low_coverage.20130415.bam -O test.bam -R refgen/hs37d5.fa.gz -TMP_DIR tmp

I got many warning, and didn't get the output file:

WARNING 2020-02-23 21:54:56     SamAlignmentMerger      Exception merging bam alignment - attempting to sort aligned reads and try again: Underlying iterator is not queryname sorted: ERR233223.95759674 2/2 101b aligned to 1:47264561-47264660. > ERR233223.14389540 2/2 101b aligned to 1:47264588-47264671.
INFO    2020-02-23 21:54:56     SamAlignmentMerger      Finished reading 28356 total records from alignment SAM/BAM.
WARNING 2020-02-23 21:54:56     SAMSequenceDictionary   Found sequence entry for which tags differ: 1 and tag UR has the two values: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz and file:/data4t/KHV1000G/refgen/hs37d5.fa.gz. Using ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
WARNING 2020-02-23 21:54:56     SAMSequenceDictionary   Found sequence entry for which tags differ: 2 and tag UR has the two values: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz and file:/data4t/KHV1000G/refgen/hs37d5.fa.gz. Using ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
.... (many warnings in here, similar structure)
[Sun Feb 23 21:54:56 ICT 2020] picard.sam.MergeBamAlignment done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=886571008
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalStateException: Reads remaining on alignment iterator: ERR233223.100003813!
        at picard.sam.AbstractAlignmentMerger.mergeAlignment(AbstractAlignmentMerger.java:558)
        at picard.sam.SamAlignmentMerger.mergeAlignment(SamAlignmentMerger.java:186)
        at picard.sam.MergeBamAlignment.doWork(MergeBamAlignment.java:358)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

What have go wrong in my case? I downloaded all of my files directly from international genomes database ftp, without any modification. Please advice me on this, thank you

alignment genome • 4.0k views
ADD COMMENT
0
Entering edit mode

Please do not delete previous questions that received comments or answers. This is considered disrespectful towards the users who invested time to help you.

ADD REPLY
0
Entering edit mode

I think you didn't tell us that you post-processed one of the bam using another REF sequence....

file:/data4t/KHV1000G/refgen/hs37d5.fa.gz
ADD REPLY
0
Entering edit mode

I download both bam files from international genome, the REF sequence you state is just the path on my disk. I didn't process any of those files

ADD REPLY
0
Entering edit mode

Ah sorry, but where did you get this file refgen/hs37d5.fa.gz ? it's not the same as the one in the BAM. The associated dict file should look like this.

@SQ SN:1    LN:249250621    M5:1b22b98cdeb4a9304cb5d48026a85128 UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz AS:NCBI37   SP:Human
@SQ SN:2    LN:243199373    M5:a0d9851da00400dec1098a9255ac712e UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz AS:NCBI37   SP:Human
(...)
@SQ SN:NC_007605    LN:171823   M5:6743bd63b3ff2b5b8985d8933c53290a UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz AS:NCBI37   SP:Human
@SQ SN:hs37d5   LN:35477943 M5:5b6a4b3a81a2d3c134b7d14bf6ad39f1 UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz AS:NCBI37   SP:Human
ADD REPLY
0
Entering edit mode

I used samtools to check @SQ tag of the bam files and got this link: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz

I downloaded directly from it

ADD REPLY
0
Entering edit mode

what is the version of picard ? do you use the latest version ?

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

I'm using picard 2.21.8, the latest version I think. I also created the dict using

java -jar picard.jar CreateSequenceDictionary R=refgen/hs37d5.fa.gz O=refgen/hs37d5.dict
ADD REPLY
0
Entering edit mode

So sorry for that, I didn't state clear enough in that one

ADD REPLY
1
Entering edit mode
4.8 years ago

Re-generate the dict file with CreateSequenceDictionary but use the following values so they match the ones in the BAM

java -jar picard.jar CreateSequenceDictionary R=refgen/hs37d5.fa.gz O=refgen/hs37d5.dict U=ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz SPECIES=Human  GENOME_ASSEMBLY=NCBI37
ADD COMMENT
0
Entering edit mode

Hi, thank you for your answer, I followed that step, this time there is less warning, but it still didn't output anyfile. The full log is here:

09:02:54.970 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/thanhnguyen/picard/build/libs/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Mon Feb 24 09:02:54 ICT 2020] MergeBamAlignment UNMAPPED_BAM=aln/HG02142.unmapped.ILLUMINA.bwa.KHV.low_coverage.20130415.bam ALIGNED_BAM=[aln/HG02142.mapped.ILLUMINA.bwa.KHV.low_coverage.20130415.bam] OUTPUT=HG02142/HG02142_fin.bam TMP_DIR=[tmp] REFERENCE_SEQUENCE=refgen/hs37d5.fa.gz    ADD_PG_TAG_TO_READS=true PAIRED_RUN=true CLIP_ADAPTERS=true IS_BISULFITE_SEQUENCE=false ALIGNED_READS_ONLY=false MAX_INSERTIONS_OR_DELETIONS=1 ATTRIBUTES_TO_REVERSE=[OQ, U2] ATTRIBUTES_TO_REVERSE_COMPLEMENT=[E2, SQ] READ1_TRIM=0 READ2_TRIM=0 ALIGNER_PROPER_PAIR_FLAGS=false SORT_ORDER=coordinate PRIMARY_ALIGNMENT_STRATEGY=BestMapq CLIP_OVERLAPPING_READS=true INCLUDE_SECONDARY_ALIGNMENTS=true ADD_MATE_CIGAR=true UNMAP_CONTAMINANT_READS=false MIN_UNCLIPPED_BASES=32 MATCHING_DICTIONARY_TAGS=[M5, LN] UNMAPPED_READ_STRATEGY=DO_NOT_CHANGE VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Mon Feb 24 09:02:54 ICT 2020] Executing as myuser@abc.com on Linux 3.13.0-147-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.21.8-SNAPSHOT
INFO    2020-02-24 09:02:55     SamAlignmentMerger      Processing SAM file(s): [aln/HG02142.mapped.ILLUMINA.bwa.KHV.low_coverage.20130415.bam]
WARNING 2020-02-24 09:02:55     SamAlignmentMerger      Exception merging bam alignment - attempting to sort aligned reads and try again: Underlying iterator is not queryname sorted: ERR233223.95759674 2/2 101b aligned to 1:47264561-47264660. > ERR233223.14389540 2/2 101b aligned to 1:47264588-47264671.
INFO    2020-02-24 09:02:55     SamAlignmentMerger      Finished reading 28356 total records from alignment SAM/BAM.
[Mon Feb 24 09:02:55 ICT 2020] picard.sam.MergeBamAlignment done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=886571008
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalStateException: Reads remaining on alignment iterator: ERR233223.100003813!
        at picard.sam.AbstractAlignmentMerger.mergeAlignment(AbstractAlignmentMerger.java:558)
        at picard.sam.SamAlignmentMerger.mergeAlignment(SamAlignmentMerger.java:186)
        at picard.sam.MergeBamAlignment.doWork(MergeBamAlignment.java:358)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
ADD REPLY
0
Entering edit mode

I also tried using another sample, HG02088.mapped.ILLUMINA.bwa.KHV.low_coverage.20130415.bam (still from Human genome project), and it has the same problem, the warning is a bit different in those lines:

WARNING 2020-02-24 09:30:58     SamAlignmentMerger      Exception merging bam alignment - attempting to sort aligned reads and try again: Underlying iterator is not queryname sorted: SRR741416.45399016 2/2 101b aligned to 1:47264572-47264658. > SRR741416.1601628 1/2 101b aligned to 1:47264588-47264688.
INFO    2020-02-24 09:30:58     SamAlignmentMerger      Finished reading 34465 total records from alignment SAM/BAM.
[Mon Feb 24 09:30:58 ICT 2020] picard.sam.MergeBamAlignment done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=886571008
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalStateException: Reads remaining on alignment iterator: SRR741415.10010200!
ADD REPLY
0
Entering edit mode

at this point I think you want MergeSamFiles, NOT MergeBamAlignment Difference Between Picards Mergebamalignment And Mergesamfiles

ADD REPLY
1
Entering edit mode

Underlying iterator is not queryname sorted: SRR741416.45399016 2/2 101b aligned to 1:47264572-47264658. > SRR741416.1601628 1/2 101b aligned to 1:47264588-47264688.

so did you sort the unampped BAM on read-names (not coordinates). Beware, it should be sorted using picard SortSam (NOT samtools) because of : https://github.com/samtools/htsjdk/issues/766

ADD REPLY
0
Entering edit mode

I downloaded the file from human genome project without any modification. I checked their documents, it seems like they sorted the mapped file:

 samtools view -bSu $sam_file | samtools sort -n -o - samtools_nsort_tmp | 
 samtools fixmate /dev/stdin /dev/stdout | samtools sort -o - samtools_csort_tmp |
 samtools fillmd -u - $reference_fasta > $fixed_bam_file

How can I sort the unmapped bam in the same manner?

ADD REPLY
0
Entering edit mode

try to use picard SortSam using the correct sorting 'queryname'

ADD REPLY
0
Entering edit mode

I have tried two cases: 1, Sorting unmapped then MergeAlignment

INFO    2020-02-25 12:13:30     SamAlignmentMerger      Processing SAM file(s): [aln/HG01842.mapped.ILLUMINA.bwa.KHV.low_coverage.20120522.bam]
WARNING 2020-02-25 12:13:30     SAMSequenceDictionary   Found sequence entry for which tags differ: 1 and tag UR has the two values: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz        AS:NCBI37       SP:Human and ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz. Using ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz        AS:NCBI37       SP:Human
(...)
WARNING 2020-02-25 12:13:31     SAMSequenceDictionary   Found sequence entry for which tags differ: hs37d5 and tag UR has the two values: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz        AS:NCBI37       SP:Human and ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz. Using ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz        AS:NCBI37       SP:Human
[Tue Feb 25 12:13:31 ICT 2020] picard.sam.MergeBamAlignment done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=886571008
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalStateException: Reads remaining on alignment iterator: ERR042514.10000846!
        at picard.sam.AbstractAlignmentMerger.mergeAlignment(AbstractAlignmentMerger.java:558)
        at picard.sam.SamAlignmentMerger.mergeAlignment(SamAlignmentMerger.java:186)
        at picard.sam.MergeBamAlignment.doWork(MergeBamAlignment.java:358)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

2, Sorting both files then MergeAlignment

INFO    2020-02-25 12:11:56     SamAlignmentMerger      Processing SAM file(s): [HG            01842.mapped.sorted.bam]
WARNING 2020-02-25 12:11:56     SAMSequenceDictionary   Found sequence entry for which tags differ: 1 and tag UR has the two values: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz                    AS:NCBI37       SP:Human and ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz. Using ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz        AS:NCBI37       SP:Human
(...)
WARNING 2020-02-25 12:11:57     SAMSequenceDictionary   Found sequence entry for which tags differ: hs37d5 and tag UR has the two values: ftp://ftp.1000genomes.ebi.ac            .uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz                    AS:NCBI37       SP:Human and ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz. Using ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz        AS:NCBI37       SP:Human
[Tue Feb 25 12:11:57 ICT 2020] picard.sam.MergeBamAlignment done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=886571008
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalStateException: Reads remaining on alignment iterator: ERR042514.10000846!
        at picard.sam.AbstractAlignmentMerger.mergeAlignment(AbstractAlignmentMerger.java:558)
        at picard.sam.SamAlignmentMerger.mergeAlignment(SamAlignmentMerger.java:186)
        at picard.sam.MergeBamAlignment.doWork(MergeBamAlignment.java:358)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

And it still output nothing

ADD REPLY
0
Entering edit mode

no it's not, as you can see I got one mapped file and one unmapped file, I need the MergeBamAlignment not the Mergesamfiles

ADD REPLY
0
Entering edit mode

oh, I'm wrong again sorry.

ADD REPLY
0
Entering edit mode
4.0 years ago
ramshahaya ▴ 10

Hi,

How did you create unmapped file for MergeBamAlignment?

ADD COMMENT

Login before adding your answer.

Traffic: 1309 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6