Question

CDS vs cDNA vs transcript for mapping RNA-Seq reads

1

Entering edit mode

6.9 years ago

williamsbrian5064 ▴ 530

Hi,

I just had a quick question that I started to confuse myself with. When I am about to index my reference sequence for some RNA-Seq data and I got confused if I use the cDNA or CDS or transcript fasta file.

In the past I used a transcript fasta but I can't seem to find one for the species an am working with from ensembl (http://useast.ensembl.org/info/data/ftp/index.html/)

Maybe I can just use the whole genome for the reference? I am using Salmon to map the reads.

Thanks in advance

Assembly rna-seq alignment • 13k views

ADD COMMENT • link updated 11 months ago by Antonio R. Franco ★ 5.2k • written 6.9 years ago by williamsbrian5064 ▴ 530

0

Entering edit mode

I basically encountered the same situation at the moment, i used Kallisto to map the raw data against transcripts of the reference pathogen. And now I am thinking to download transcripts of the host reference, but the assembed genome drop list only have "CDS from genomic FASTA. fna" or "translated CDS. faa" or "Protein FASTA . faa" to choose.

I was wondering which one means the transcripts. or how can you generate a transcripts file from the current resources.

Many thanks

ADD REPLY • link 2.3 years ago by Xuhang Wu • 0

score 5 · Accepted Answer · 2018-05-02

5

Entering edit mode

6.9 years ago

Rob 7.1k

You should not map against the genome using Salmon. You can either download a transcriptome file, or a genome file and transcript annotations, and use a tool like gffread to extract the transcript sequences. You most likely want to quantify against the cDNA to account for features such as UTRs.

ADD COMMENT • link 6.9 years ago by Rob 7.1k

0

Entering edit mode

You cannot do it with Kallisto either. In fact, the indexation will fail in attempting to index a genome file

ADD REPLY • link 11 months ago by Antonio R. Franco ★ 5.2k

score 4 · Accepted Answer · 2018-05-02

4

Entering edit mode

6.9 years ago

GenoMax 150k

Here is an illustration of differences between CDS/transcripts.

You can find cDNA's and CDS's for Felis catus here.

Ideally you should align your data to the genome and then use an annotation file to do counting of reads.

Added: If you are using salmon then you should use transcripts.

ADD COMMENT • link 6.9 years ago by GenoMax 150k

0

Entering edit mode

You suggested to align the data to the genome but Rob suggested exactly opposite. Which one is right?

ADD REPLY • link 6.2 years ago by scchess ▴ 640

0

Entering edit mode

It is your choice. If you use a program like salmon then you need to align to transcripts (if they are available for your genome). If you use a normal NGS aligner then you can align to genome and then count using a program like featureCounts or htseq-count.

This would help: A: Alignment and mapping

ADD REPLY • link 6.2 years ago by GenoMax 150k