Question

diferent contigs chrM

0

Entering edit mode

8.8 years ago

cristina_sabiers ▴ 110

Hi!

I try to call snps to learn how to do it. I got myt bam file, I hadn't made it. I have the snp, indel report from this bam on excel file, thats because I thought the most easy way for me to learn call snps and see if I do right.

I get this error message (see above)

As I understand chrM has different length as my reference chrM (hg19). These people when aligned the file used another chrM reference than mine? or is another reason why I get this error?Is a way to fix it?

I follow this tutorial What Is The Best Pipeline For Human Whole Exome Sequencing?

Thanks!!! :)

java -jar /home/cri/Desktop/GATK/GenomeAnalysisTK.jar -T RealignerTargetCreator -R hg19.fa -I 010.bam  -o 010.intervals

##### ERROR MESSAGE: Input files reads and reference have incompatible contigs. Please see https://www.broadinstitute.org/gatk/guide/article?id=63for more information. Error details: Found contigs with the same name but different lengths or MD5s:
##### ERROR   contig  reads is named chrM with length 16569
##### ERROR   contig  reference is named chrM with length 16571 and MD5 d2ed829b8a1628d16cbeee88e88e39eb.
##### ERROR   reads contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrM]
##### ERROR   reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrM]

alignment • 5.8k views

ADD COMMENT • link updated 2.6 years ago by Maximilian Haeussler ★ 1.7k • written 8.8 years ago by cristina_sabiers ▴ 110

2

Entering edit mode

For UCSC hg38, "chrM" is the sequence that NCBI and Ensembl call "MT".

For UCSC hg19, a long time ago, had to pick a mitochondrial reference sequence as it wasn't part of the assembly release and UCSC picked a different mito genome not the Cambridge reference sequence. But they added the reference later. So in the updated hg19 UCSC genome files you have both "chrM" (for backwards compatibility) and "chrMT" (for newer results) See the discussion at https://genome.ucsc.edu/FAQ/FAQgenes.html#mito This is one good reason to download the sequence file "for analysis" (=alignment, etc) from https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/analysisSet/ rather than the complete/full/patched files at https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/latest/ the "latest" directory has fasta files with all and any sequences, patches, fixes, alternates, most of which you never need for alignments or most other analysis. The "latest" directory is rather made if you want the entire full sequence that the genome browser can show, including all the weird sequences that most people usually ignore or gloss over. :-)

ADD REPLY • link 2.6 years ago by Maximilian Haeussler ★ 1.7k

0

Entering edit mode

If you are using your own reference files then why are you getting the length mismatch error? If you are not using your own copy of the hg19 that you used for the alignment then use that.

ADD REPLY • link 8.8 years ago by GenoMax 150k

0

Entering edit mode

I dont use my own copy, I use this one instead because I have the results of snp calling already done, just for learn the steps and see if I do right. Isn't recommended?

Anyhow why this miss match in just chrM? Is any reason for that? (Just for know it)

Thanks again genomax!

ADD REPLY • link 8.8 years ago by cristina_sabiers ▴ 110

0

Entering edit mode

As you discovered you can't do that (since there may be subtle differences in the data, as you discovered here). In NGS analyses copy of the genome (and ALL allied files e.g. indexes, annotation) have to be identical from start to finish. If you got them from the same source (e.g. iGenomes) then you could use the same files that someone else did.

If you want to use some other copy of genome/indexes/data then you need to do the entire set of alignment steps again :-)

ADD REPLY • link 8.8 years ago by GenoMax 150k

0

Entering edit mode

oh...ok! Thanks for explanation...Didn't think all of this is sooooo tricky O_o

then I try with my bam file :)

ADD REPLY • link 8.8 years ago by cristina_sabiers ▴ 110

0

Entering edit mode

Hi, I've got a similar issue, just posted it around here yesterday Variant annotation help - predictCoding error message

I'm not sure how can I best move forward - I don't have the fastq files for the data used to generate the vcf files so the option re-map and align this data is not valid. Is there a way I could alter those for the annotation database (txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene) ?

So to put this in context, I'm following the tutorial here to annotate my VCF files https://rockefelleruniversity.github.io/RU_GenomicVariants/presentations/singlepage/RU_GenomicVariant_session2.html#Predict_amino_acid_changes25

and once I reach the step to run predictCoding to predict the amino acid changes that happen when I run this command:

coding <- predictCoding(vcf_chr1, txdb, seqSource = Hsapiens)

I get the following error message

Error in mergeNamedAtomicVectors(seqlengths(x), seqlengths(y), what = c("sequence", : sequence chrM has incompatible seqlengths: in 'x': 16569 in 'y': 16571

I had initially subset chr1, so the VCF file should contain chr1 only so I'm finding it a bit odd that it says sequence chrM has incompatible seqlengths. (Instead of vcf_chr1 above, my file name is HCC_Tissue_041_chr1)

HCC_Tissue_041_chr1 <- HCC_Tissue_041[grepl(names(HCC_Tissue_041), pattern = "chr1:")]
rd_041_chr1 <- rowRanges(HCC_Tissue_041_chr1)

Any suggestions, please? Thank you!

ADD REPLY • link 2.8 years ago by Rozita ▴ 40

score 4 · Answer 1 · 2016-07-02

4

Entering edit mode

8.8 years ago

lh3 33k

Most researchers who study human mitochondria use the revised cambridge reference sequence (rCRS). However, GRCh37 includes a different mitochondrial sequence. It is 2bp longer. For popgen analyses, 1000 Genomes project decided to use rCRS. Broad and a few other institutes adopted the 1000g reference as well. So, there are two common chrM, one with 16569bp (rCRS) and the other with 16571bp (GRCh37).

GRCh38 follows the convention and uses rCRS. There won't be such confusion for GRCh38.