CDS vs cDNA vs transcript for mapping RNA-Seq reads
2
1
Entering edit mode
6.6 years ago

Hi,

I just had a quick question that I started to confuse myself with. When I am about to index my reference sequence for some RNA-Seq data and I got confused if I use the cDNA or CDS or transcript fasta file.

In the past I used a transcript fasta but I can't seem to find one for the species an am working with from ensembl (http://useast.ensembl.org/info/data/ftp/index.html/)

Maybe I can just use the whole genome for the reference? I am using Salmon to map the reads.

Thanks in advance

Assembly rna-seq alignment • 12k views
ADD COMMENT
0
Entering edit mode

I basically encountered the same situation at the moment, i used Kallisto to map the raw data against transcripts of the reference pathogen. And now I am thinking to download transcripts of the host reference, but the assembed genome drop list only have "CDS from genomic FASTA. fna" or "translated CDS. faa" or "Protein FASTA . faa" to choose.

I was wondering which one means the transcripts. or how can you generate a transcripts file from the current resources.

Many thanks

ADD REPLY
5
Entering edit mode
6.6 years ago
Rob 6.9k

You should not map against the genome using Salmon. You can either download a transcriptome file, or a genome file and transcript annotations, and use a tool like gffread to extract the transcript sequences. You most likely want to quantify against the cDNA to account for features such as UTRs.

ADD COMMENT
0
Entering edit mode

You cannot do it with Kallisto either. In fact, the indexation will fail in attempting to index a genome file

ADD REPLY
4
Entering edit mode
6.6 years ago
GenoMax 148k

Here is an illustration of differences between CDS/transcripts.

You can find cDNA's and CDS's for Felis catus here.

Ideally you should align your data to the genome and then use an annotation file to do counting of reads.

Added: If you are using salmon then you should use transcripts.

ADD COMMENT
0
Entering edit mode

You suggested to align the data to the genome but Rob suggested exactly opposite. Which one is right?

ADD REPLY
0
Entering edit mode

It is your choice. If you use a program like salmon then you need to align to transcripts (if they are available for your genome). If you use a normal NGS aligner then you can align to genome and then count using a program like featureCounts or htseq-count.

This would help: A: Alignment and mapping

ADD REPLY

Login before adding your answer.

Traffic: 1866 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6