I got some bam files generated by using hg19 reference. Then I started to apply GATK pipeline to these files. But when I use the Picard to reorder the file, I got the error as
Exception in thread "main" net.sf.picard.PicardException: Discordant contig lengths: read chrM LN=16569, ref chrM LN=16571
I guess this problem is caused by using a different reference genome. I know hg19 reference has different versions (such as 1000 genome or ucsc), but I googled online and got the information that there is no difference between different versions.
So I am confused. Does anyone know what's the problem?
Thank you
Thank you. Do you know where can I download the reference file based on Cambridge Reference Sequence for GATK? It seems on the GATK ftp, there is only one hg19 reference file.
You can download it from the Ensembl FTP site where the FASTA DNA sequence is available. You will have the sequences unmasked, soft masked (sm) and hard masked (rm). More details can be found in the README file. Good luck with your analyses
Thank you very much. But I notice that these files are GRCh37. So we are talking the difference between GRCh37 and hg19.
Ok, I get it.
I guess it was the analysis software fooled me. The software I used to get the bam file is called ION Torrent. We selected the hg19 as reference and get the bam file. Then , I tried to use GATK pipeline to get the variants from the bam file. This problem came up and I thought it shouldn't be like this because I also used hg19 as reference.
So it turned out to be the ION Torrent was actually using GRCh37 as reference. Am I correct?
Thank you.
In short, yes. The "hg19" on Ion Torrent uses the MT sequence in GRCh37 but it follows naming conventions from UCSC hg19 so that it was named chrM. (Chromosomes in UCSC hg19 and also in Ion Torrent hg19 are named chr1, chr2, ..., chrX, chrY, chrM, whereas in GRCh37 they are named 1, 2, ..., X, Y, MT). Except chrM, the sequences for all other chromosomes are the same. Also the Ion Torrent hg19 removed all the unlocalized/unplaced/alternate sequences ("ChrUn*, etc.) from UCSC hg19.
Thank you so much.
It's not only the MT genome: Chry In 1000G Vs Hg19 http://plindenbaum.blogspot.fr/2013/07/g1kv37-vs-hg19.html
Sorry, I am now confused. I thought we were talking about the difference in different versions of hg19. Your link is talking about the difference between GRCh37 and hg19. So the Ensembl version is GRCh37, not a different version of hg19?
Ah yes, sorry, I thought to wanted to compare different versions of the same build.