Entering edit mode
8.0 years ago
confusedious
▴
490
Hello everyone,
I have a large collection of human mitochondrial sequences and would like to characterise variable sites by whether they are coding or non-coding, if coding which codon position they are (i.e. 1st, 2nd or 3rd), and whether the variants present are synonymous or non-synonymous mutations.
Can anyone recommend some software that can do this by batch? Ideally I'd like to be able to just input a .fasta file and have it characterise all variable sites.
I have tried Mitomaster, but it is only telling me which sites are variable - not telling me anything about the nature of that variation.
Essentially you want to find the differences between fasta files containing human mitochondrial sequences with the reference fasta and characterize those variations?
Correct.
But not between multiple fasta files - I have a fasta file that is an alignment of human mitochondrial sequences. This alignment has been aligned to the rCRS so that the site numbering matches the conventions of human mitochondrial work.
From there I want the variable positions to be labelled for whether they are coding/non-coding, codon position if coding, and whether the change is synonymous or non-synonymous.
So a pairwise alignment of two fasta files of which one is the rCRS and the other one is an assembly of human mitochondrial sequences? I think you could do that with something from mummer: http://nebc.nerc.ac.uk/bioinformatics/documentation/mummer/dnadiff.README
Thank you again for your reply.
There is just the one fasta file which is an alignment of human mitochondrial sequences aligned to the rCRS numbering, but I can always generate another fasta file which is just the rCRS if the software needs it.
I will look at mummer and see how I go.
How did you obtain those human mitochondrial fasta sequences?
There are thousands of publicly available human mitochondrial sequences on GenBank.