Hi, I'm new to genome analysis and I need some advice on this. I'm currently follow GATK best practice, now I'm stuck at the MergeBamAlignment step of picard. I'm using the data from https://www.internationalgenome.org/, the reference genome is also downloaded from International genome database.
My command:
MergeBamAlignment -ALIGNED aln/HG02142.mapped.ILLUMINA.bwa.KHV.low_coverage.20130415.bam -UNMAPPED /aln/HG02142.unmapped.ILLUMINA.bwa.KHV.low_coverage.20130415.bam -O test.bam -R refgen/hs37d5.fa.gz -TMP_DIR tmp
I got many warning, and didn't get the output file:
WARNING 2020-02-23 21:54:56 SamAlignmentMerger Exception merging bam alignment - attempting to sort aligned reads and try again: Underlying iterator is not queryname sorted: ERR233223.95759674 2/2 101b aligned to 1:47264561-47264660. > ERR233223.14389540 2/2 101b aligned to 1:47264588-47264671.
INFO 2020-02-23 21:54:56 SamAlignmentMerger Finished reading 28356 total records from alignment SAM/BAM.
WARNING 2020-02-23 21:54:56 SAMSequenceDictionary Found sequence entry for which tags differ: 1 and tag UR has the two values: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz and file:/data4t/KHV1000G/refgen/hs37d5.fa.gz. Using ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
WARNING 2020-02-23 21:54:56 SAMSequenceDictionary Found sequence entry for which tags differ: 2 and tag UR has the two values: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz and file:/data4t/KHV1000G/refgen/hs37d5.fa.gz. Using ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
.... (many warnings in here, similar structure)
[Sun Feb 23 21:54:56 ICT 2020] picard.sam.MergeBamAlignment done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=886571008
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalStateException: Reads remaining on alignment iterator: ERR233223.100003813!
at picard.sam.AbstractAlignmentMerger.mergeAlignment(AbstractAlignmentMerger.java:558)
at picard.sam.SamAlignmentMerger.mergeAlignment(SamAlignmentMerger.java:186)
at picard.sam.MergeBamAlignment.doWork(MergeBamAlignment.java:358)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
What have go wrong in my case? I downloaded all of my files directly from international genomes database ftp, without any modification. Please advice me on this, thank you
Please do not delete previous questions that received comments or answers. This is considered disrespectful towards the users who invested time to help you.
I think you didn't tell us that you post-processed one of the bam using another REF sequence....
I download both bam files from international genome, the REF sequence you state is just the path on my disk. I didn't process any of those files
Ah sorry, but where did you get this file refgen/hs37d5.fa.gz ? it's not the same as the one in the BAM. The associated dict file should look like this.
I used samtools to check @SQ tag of the bam files and got this link: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
I downloaded directly from it
what is the version of picard ? do you use the latest version ?
did you create the associated
dict
file using https://broadinstitute.github.io/picard/command-line-overview.html#CreateSequenceDictionaryI'm using picard 2.21.8, the latest version I think. I also created the dict using
So sorry for that, I didn't state clear enough in that one