Hi all
If I want to align my RNA-seq reads to mm9 cDNA reference. How I download this fasta file?
Thanks in advance,
Yachen
Hi all
If I want to align my RNA-seq reads to mm9 cDNA reference. How I download this fasta file?
Thanks in advance,
Yachen
Edit: as ATpoint mentions, you will require either a reference genome FASTA or reference cDNA FASTA depending on what you are planning to do.
You can obtain this from GENCODE: https://www.gencodegenes.org/mouse_releases/reference_releases.html
The latest releases for each build are shown on that page.
Other releases can be accessed via the drop-downs / tabs. You can download both GTF and corresponding FASTA files,
Kevin
Not always the case. Take a look at my post above. Text also here:
RNA-seq:
The GTF contains extra information about the transcripts that are stored in the cDNA FASTA; mainly, it contains the genomic co-ordinates of UTRs, exons, etc.
The cDNA FASTA contains the transcribed mRNA sequence. However, only the cDNA FASTA file can be used for alignment because it contains the actual sequence.
An example:
From cDNA FASTA:
grep -e "Brca1" -A5 gencode.vM18.transcripts.fa
>ENSMUST00000191198.1|ENSMUSG00000017146.12|OTTMUSG00000002870.3|OTTMUST00000119752.2|RP23-328K2.8-007|Brca1|531|protein_coding|
ACAGAGGGTCTCAAGCCCCCCTTGAGACACGCGCTTAACCTCAGTCAGGAGAAAGTAGAA
ATGGAAGACAGTGAACTTGATACTCAGTATTTGCAGAATACATTTCAAGTTTCAAAGCGT
CAGTCATTTGCTTTATTTTCAAAACCTAGAAGTCCCCAAAAGGACTGTGCTCACTCTGTG
CCCTCAAAGGAACTGAGTCCAAAGGTGACAGCTAAAGGTAAACAAAAAGAACGTCAGGGA
CAGGAAGAATTTGAAATCAGTCACGTACAAGCAGTTGCGGCCACAGTGGGCTTACCTGTG
From GTF:
grep -e "Brca1" gencode.vM18.annotation.gtf
chr11 HAVANA transcript 101532083 101551582 . - . gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11 HAVANA exon 101551526 101551582 . - . gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 1; exon_id "ENSMUSE00001328968.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11 HAVANA exon 101539981 101540034 . - . gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 2; exon_id "ENSMUSE00001218040.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11 HAVANA exon 101535528 101535605 . - . gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 3; exon_id "ENSMUSE00001241451.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11 HAVANA CDS 101535528 101535598 . - 0 gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 3; exon_id "ENSMUSE00001241451.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11 HAVANA start_codon 101535596 101535598 . - 0 gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 3; exon_id "ENSMUSE00001241451.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
chr11 HAVANA exon 101533939 101534027 . - . gene_id "ENSMUSG00000017146.12"; transcript_id "ENSMUST00000190862.1"; gene_type "protein_coding"; gene_name "Brca1"; transcript_type "protein_coding"; transcript_name "RP23-328K2.8-005"; exon_number 4; exon_id "ENSMUSE00000113052.1"; level 2; protein_id "ENSMUSP00000139599.1"; transcript_support_level "5"; tag "mRNA_end_NF"; tag "cds_end_NF"; havana_gene "OTTMUSG00000002870.3"; havana_transcript "OTTMUST00000119754.2";
et cetera
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Please explain what the file is, i.e., source, etc. Don't just paste a random link. Thanks.
Keep in mind that alignments should be performed against a genome rather than the cDNA reference. You want to do standard RNA-seq alignments?
I want to do standard RNA-seq analysis. FTP has DNA fasta and cDNA fasta, but they do not show mm9 or mm10. Depend on that I think I need cDNA reference for RNA-seq analysis. And I think genome reference is for DNA-seq analysis( like ChIP-seq...). Did I misunderstand? Thanks!
RNA-seq: