Some questions about write human mitochondrial variants into VCF file
1
0
Entering edit mode
6.4 years ago
MatthewP ★ 1.4k

Hello, I have a variants result of mtDNA sequencing. Here is my result like:

SampleID        Pos     Ref     Variant Major/Minor     Variant-Level   Coverage-FWD    Coverage-Rev    Coverage-Total
R07058.bam      9090    T       C       C/A     0.9974  2100    2136    4236

This result comes from mtDNA-Server. Major means major nucleotide at 1 site, minor means opposite. Variant-Level seems to mean the ratio of variants, but I am not sure about that.

I want to annotate those variants by using snpEff which needs input VCF file, so I try to write a python script to convert this to VCF format file. I already read VCF format required before I started.

Considering that mitochondrial is haploid I separate each variant of same site as different variants in VCF. In this example it would be 2 lines in VCF:

#CHROM  POS     ID      REF     ALT ...
MT      9090    .       T       A ...
MT      9090    .       T       C ...

I hope this solution is right.

My questions is about INFO columns in VCF. mtDNA is haploid however it may have many(unknow) copies in cell, I don't know how to fill this tag in INFO:

  1. AC : allele count in genotypes, for each ALT allele, in the same order as listed.
  2. AN : total number of alleles in called genotypes.
  3. DP : combined depth across samples, e.g. DP=154. I know it would be many depth values because more than 1 sample will be put in 1 line in VCF. But I don't know what is combined depth across them and how to calculate.

Any help is appreciate.

VCF mtDNA • 3.9k views
ADD COMMENT
1
Entering edit mode

Hello MatthweP,

could you please describe what the columns Major/Minor and Variant-Level are for? Why do you need a vcf file?

Also it is better to use the code button in the formatting bar to show file contents. I've done it for you this time.

code_formatting

fin swimmer

ADD REPLY
0
Entering edit mode

Thanks for your advice, I have re-edit this question and explain Major/Minor means.

ADD REPLY
0
Entering edit mode

Hello MatthewP,

thank you for adding information to your question. But I still doesn't understand what is meant by Major/Minor? Because in the Variant column there is only a C.

Also it is necessary to understand why you need a vcf file. In the easiest case your vcf file just need values in the CHROM, POS, REF and ALT column. All other mandatory fields can be filled with . if these information aren't needed for downstream analyses.

fin swimmer

ADD REPLY
0
Entering edit mode

Majo/minor is a column that is included in the results generated from mtDNA server. it creates two profiles based on variant allele frequencies - major and minor and this info is used to perform haplogroup checks for each heteroplasmic site

ADD REPLY
0
Entering edit mode

Thank you Nandini! Can I ask where you get all this information about mtDNA server? There is no detail document on github project. Actually I have to guess all those tags means.

ADD REPLY
0
Entering edit mode

Hi Matthew, I've used mtDNA server before setting up my own pipeline for our lab. Have you read the paper for the tool ? It should be given in that.

ADD REPLY
0
Entering edit mode

Yep, I read the paper before I download this tool. I also want to set my own pipeline, but I don't know how for I am just a beginner of bioinformatics. How do you do the variant call job? Do you have some guidance for me about building this mtDNA pipeline?

ADD REPLY
0
Entering edit mode

Sure, I can help you with that but it would be useful to know what is the aim of your project ? what samples are you analysing ? Why do you need to convert the results into vcf format ? do you only need to call variants or do you need to perform further downstream analysis ?

ADD REPLY
0
Entering edit mode

Thank you! Can I have your e-mail address? I will send e-mail to discuss with you.

ADD REPLY
1
Entering edit mode

Please don't ask for email addresses. We like to keep the discussion open and on the forum so it benefits everyone.

ADD REPLY
0
Entering edit mode

Well, I work in a company offering sequencing service. This is our company first time received mtDNA order. Our client want us to analyse heterogeneous of mtDNA(variants) and copy number variants(CNV). They are using multi-PCR to obtain mtDNA library, so I think we can't get CNV from such data, there is no nuclear genome to normalize between samples. I want to offer them very good variants report. Here is my pipeline requirements:

  1. QC control and mapping. I currently using bwa to do mapping job, but confusing using which reference, I currently using rCRS recommend by rCRS vs. RSRS vs. HG19 (Yoruba).. Is rCRS the same with chrMT of GRCH38 or HG38? (If i use whole human genome as reference some reads will mapped to other chromosome especially chr2)
  2. Variant Calling. I totally no idea about it.
  3. Annotation, snpEff seems good to me. Any other suggestions?
  4. If possible, I want to give some biological or medical analyse of those variants, for example some SNP may causing some disease. I am trying to find some database may be useful on MITOMAP . I never done such job before, maybe I need some tools beside all those database?

Detail about sequencing method: Library construction using MultipSeqTm AImumiCap Panel which use 129 paired primers to PCR whole mtDNA.

ADD REPLY
1
Entering edit mode

There are several publications and automated pipelines that does this for you but as you work for a company, you need to see if these softwares are freely available for you to use.

So my pipeline for mtDNA analysis is as follows

1.Mapping: BWA with rCRS (hg19)

  1. Mark duplicates with Picard

  2. Variant calling: samtools and varscan

  3. Variant annotation: annovar

  4. Additional annotation: Mitomap

Hope this helps. Good luck

ADD REPLY
0
Entering edit mode

We like to set our own pipelines so it's easy to maintains and upgrade. Thank you very much I will try your pipeline.

ADD REPLY
0
Entering edit mode

Okay. But definitely do some research before implementing the pipeline as some of the tools may or may not suit your requirements

ADD REPLY
0
Entering edit mode

Ok, I need to annotate those variants using snpEff which input VCF file.

ADD REPLY
2
Entering edit mode
6.4 years ago

I hope this solution is right.

no, in a valid VCF you should find only one CHROM/POS/REF. See the VCF spec, for example for the attribute associated to the ALT allele (e.g AF, Number='A'), you should find the same number of data than the number of ALT allele. Example:

##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
#CHROM  POS     ID      REF     ALT ... INFO
MT      9090    .       T       A,C ... AN=100;AC=1,50;AF=0.01,0.5
ADD COMMENT
0
Entering edit mode

Thanks, I will check VCF protocol again! However I still don't know how to decide the AC and AN values, because I don't know the copy number of mtDNA. If one of the variant is deletion, should it also be same line with SNP? Like:

#CHROM  POS     ID      REF     ALT ... 
MT    9090    .    AT    A, AC ...

Am i understanding this right?

ADD REPLY
0
Entering edit mode

There are vcf validators. Try one of them.

ADD REPLY

Login before adding your answer.

Traffic: 1964 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6