Where can I find a fasta for mitochondrial genes?
2
0
Entering edit mode
3.6 years ago
blur ▴ 280

Hi,

I want to align reads to the mm10 and hg19 mitochondrial genes. Were can I get the fasta file for all these genes as a reference?

I found this: https://www.ncbi.nlm.nih.gov/genome/51?genome_assembly_id=1559677 but I don't think it will give me the genes... Any help would be appreciated

mitochondrial genes • 2.2k views
ADD COMMENT
0
Entering edit mode

Note that hg19 mtDNA is not the rCRS. This one was already present in GRCh37, which is one of the differences between hg19 and GRCh37, and is now in hg38.

ADD REPLY
2
Entering edit mode
3.6 years ago
GenoMax 147k

Method 1:

  1. Download relevant annotation files from GENCODE. Human hg19 and Mouse mm10
  2. Fish out Ensembl ID's of the genes from each file awk -F "\"" '$1 ~ /chrM/ && $3 ~ /gene/ {print $2}' gencode.v37lift37.annotation.gtf (this is hg19 example)
  3. Use BioMart to retrieve sequence

Method 2:

  1. Download annotation file from NCBI. Human hg19 or Mouse mm10
  2. Get the sequence using EntrezDirect
    awk -F "\t" '$1 ~ /NC_012920/ && $3 ~ /gene/ {print $1,$4,$5}' GRCh37_latest_genomic.gff | xargs -n 3 sh -c 'efetch -db nuccore -id "$0" -seq_start "$1" -seq_stop "$2" -format fasta' (human hg19)

  3. For mouse mm10
    awk -F "\t" '$1 ~ /NC_005089/ && $3 ~ /gene/ {print $1,$4,$5}' GCF_000001635.26_GRCm38.p6_genomic.gtf | xargs -n 3 sh -c 'efetch -db nuccore -id "$0" -seq_start "$1" -seq_stop "$2" -format fasta'

ADD COMMENT
0
Entering edit mode

For Method 1 you could directly download the transcriptome fasta files from Gencode and pull the cDNA/spliced sequences from there (if this is what you want rather than the entire genomic gene sequence incl introns).

ADD REPLY
0
Entering edit mode
3.6 years ago
m3hdad ▴ 10

There are different tools and repositories. GRCh37 (hg19) has been archived on Ensembl.

Here are some tips:

  • Make sure you have a good reason why you don't want to align your reads against an updated version.
  • Make sure you understand what "fasta" and "gtf/gff" files are.

Some tools you might want to read about are:

Good luck and I think finding data for mm10 should be straight forward now.

ADD COMMENT

Login before adding your answer.

Traffic: 2341 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6