mm9 RNA-seq reference genome
1
0
Entering edit mode
6.3 years ago
mikysyc2016 ▴ 120

Hi all

If I want to align my RNA-seq reads to mm9 cDNA reference. How I download this fasta file?

Thanks in advance,
Yachen

alignment sequencing RNA-seq gene • 3.3k views
ADD COMMENT
0
Entering edit mode

Please explain what the file is, i.e., source, etc. Don't just paste a random link. Thanks.

ADD REPLY
0
Entering edit mode

Keep in mind that alignments should be performed against a genome rather than the cDNA reference. You want to do standard RNA-seq alignments?

ADD REPLY
0
Entering edit mode

I want to do standard RNA-seq analysis. FTP has DNA fasta and cDNA fasta, but they do not show mm9 or mm10. Depend on that I think I need cDNA reference for RNA-seq analysis. And I think genome reference is for DNA-seq analysis( like ChIP-seq...). Did I misunderstand? Thanks!

ADD REPLY
0
Entering edit mode

RNA-seq:

  • TopHat / TopHat2 / HISAT / HISAT2 = reference genome
  • Kallisto / Salmon = reference cDNA transcriptome
ADD REPLY
2
Entering edit mode
6.3 years ago

Edit: as ATpoint mentions, you will require either a reference genome FASTA or reference cDNA FASTA depending on what you are planning to do.

You can obtain this from GENCODE: https://www.gencodegenes.org/mouse_releases/reference_releases.html

The latest releases for each build are shown on that page.

  • GRCm37 = mm9
  • GRCm38 = mm10

Other releases can be accessed via the drop-downs / tabs. You can download both GTF and corresponding FASTA files,

Kevin

ADD COMMENT
0
Entering edit mode

Thank you. Can I use them for standard RNA-seq analysis? What is the difference between genome FASTA( I think genome FASTA is for DNA-seq) and reference cDNA FASTA? WHich one I can use? And what is the difference if I download GTF or FASTA? Thanks in advance.

ADD REPLY
0
Entering edit mode

Not always the case. Take a look at my post above. Text also here:

RNA-seq:

  • TopHat / TopHat2 / HISAT / HISAT2 = reference genome
  • Kallisto / Salmon = reference cDNA transcriptome

------------------------------------------------------

The GTF contains extra information about the transcripts that are stored in the cDNA FASTA; mainly, it contains the genomic co-ordinates of UTRs, exons, etc.

The cDNA FASTA contains the transcribed mRNA sequence. However, only the cDNA FASTA file can be used for alignment because it contains the actual sequence.

An example:

From cDNA FASTA:

grep -e "Brca1" -A5 gencode.vM18.transcripts.fa

>ENSMUST00000191198.1|ENSMUSG00000017146.12|OTTMUSG00000002870.3|OTTMUST00000119752.2|RP23-328K2.8-007|Brca1|531|protein_coding|
ACAGAGGGTCTCAAGCCCCCCTTGAGACACGCGCTTAACCTCAGTCAGGAGAAAGTAGAA
ATGGAAGACAGTGAACTTGATACTCAGTATTTGCAGAATACATTTCAAGTTTCAAAGCGT
CAGTCATTTGCTTTATTTTCAAAACCTAGAAGTCCCCAAAAGGACTGTGCTCACTCTGTG
CCCTCAAAGGAACTGAGTCCAAAGGTGACAGCTAAAGGTAAACAAAAAGAACGTCAGGGA
CAGGAAGAATTTGAAATCAGTCACGTACAAGCAGTTGCGGCCACAGTGGGCTTACCTGTG

From GTF:

grep -e "Brca1" gencode.vM18.annotation.gtf

chr11   HAVANA  transcript  101532083   101551582   .   -   .   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11   HAVANA  exon    101551526   101551582   .   -   .   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 1; exon_id "ENSMUSE00001328968.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11   HAVANA  exon    101539981   101540034   .   -   .   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 2; exon_id "ENSMUSE00001218040.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11   HAVANA  exon    101535528   101535605   .   -   .   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 3; exon_id "ENSMUSE00001241451.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11   HAVANA  CDS 101535528   101535598   .   -   0   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 3; exon_id "ENSMUSE00001241451.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11   HAVANA  start_codon 101535596   101535598   .   -   0   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 3; exon_id "ENSMUSE00001241451.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11   HAVANA  exon    101533939   101534027   .   -   .   gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 4; exon_id "ENSMUSE00000113052.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";


et cetera
ADD REPLY
0
Entering edit mode

Thank you! That make sense.I prepare to use kallisto to align the reads which I can run my personal computer. Can I use the link you provide(Are they reference genome or reference cDNA transcriptome)?

ADD REPLY
0
Entering edit mode

At the link that I provided, you will find reference cDNA transcriptome FASTA files. You should use these for Kallisto.

ADD REPLY
1
Entering edit mode

I see. Thanks a lot!

ADD REPLY

Login before adding your answer.

Traffic: 2003 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6