Hello, I have a variants result of mtDNA sequencing. Here is my result like:
SampleID Pos Ref Variant Major/Minor Variant-Level Coverage-FWD Coverage-Rev Coverage-Total
R07058.bam 9090 T C C/A 0.9974 2100 2136 4236
This result comes from mtDNA-Server. Major means major nucleotide at 1 site, minor means opposite. Variant-Level seems to mean the ratio of variants, but I am not sure about that.
I want to annotate those variants by using snpEff which needs input VCF file, so I try to write a python script to convert this to VCF format file. I already read VCF format required before I started.
Considering that mitochondrial is haploid I separate each variant of same site as different variants in VCF. In this example it would be 2 lines in VCF:
#CHROM POS ID REF ALT ...
MT 9090 . T A ...
MT 9090 . T C ...
I hope this solution is right.
My questions is about INFO columns in VCF. mtDNA is haploid however it may have many(unknow) copies in cell, I don't know how to fill this tag in INFO:
- AC : allele count in genotypes, for each ALT allele, in the same order as listed.
- AN : total number of alleles in called genotypes.
- DP : combined depth across samples, e.g. DP=154. I know it would be many depth values because more than 1 sample will be put in 1 line in VCF. But I don't know what is combined depth across them and how to calculate.
Any help is appreciate.
Hello MatthweP,
could you please describe what the columns
Major/Minor
andVariant-Level
are for? Why do you need a vcf file?Also it is better to use the code button in the formatting bar to show file contents. I've done it for you this time.
fin swimmer
Thanks for your advice, I have re-edit this question and explain Major/Minor means.
Hello MatthewP,
thank you for adding information to your question. But I still doesn't understand what is meant by
Major/Minor
? Because in theVariant
column there is only aC
.Also it is necessary to understand why you need a
vcf
file. In the easiest case your vcf file just need values in theCHROM
,POS
,REF
andALT
column. All other mandatory fields can be filled with.
if these information aren't needed for downstream analyses.fin swimmer
Majo/minor is a column that is included in the results generated from mtDNA server. it creates two profiles based on variant allele frequencies - major and minor and this info is used to perform haplogroup checks for each heteroplasmic site
Thank you Nandini! Can I ask where you get all this information about mtDNA server? There is no detail document on github project. Actually I have to guess all those tags means.
Hi Matthew, I've used mtDNA server before setting up my own pipeline for our lab. Have you read the paper for the tool ? It should be given in that.
Yep, I read the paper before I download this tool. I also want to set my own pipeline, but I don't know how for I am just a beginner of bioinformatics. How do you do the variant call job? Do you have some guidance for me about building this mtDNA pipeline?
Sure, I can help you with that but it would be useful to know what is the aim of your project ? what samples are you analysing ? Why do you need to convert the results into vcf format ? do you only need to call variants or do you need to perform further downstream analysis ?
Thank you! Can I have your e-mail address? I will send e-mail to discuss with you.
Please don't ask for email addresses. We like to keep the discussion open and on the forum so it benefits everyone.
Well, I work in a company offering sequencing service. This is our company first time received mtDNA order. Our client want us to analyse heterogeneous of mtDNA(variants) and copy number variants(CNV). They are using multi-PCR to obtain mtDNA library, so I think we can't get CNV from such data, there is no nuclear genome to normalize between samples. I want to offer them very good variants report. Here is my pipeline requirements:
Detail about sequencing method: Library construction using MultipSeqTm AImumiCap Panel which use 129 paired primers to PCR whole mtDNA.
There are several publications and automated pipelines that does this for you but as you work for a company, you need to see if these softwares are freely available for you to use.
So my pipeline for mtDNA analysis is as follows
1.Mapping: BWA with rCRS (hg19)
Mark duplicates with Picard
Variant calling: samtools and varscan
Variant annotation: annovar
Additional annotation: Mitomap
Hope this helps. Good luck
We like to set our own pipelines so it's easy to maintains and upgrade. Thank you very much I will try your pipeline.
Okay. But definitely do some research before implementing the pipeline as some of the tools may or may not suit your requirements
Ok, I need to annotate those variants using snpEff which input VCF file.