Should cndas provided by Ensembl be filtered by ccds or biotype prior to running kallisto?
0
0
Entering edit mode
6.1 years ago
holgerbrandl ▴ 30

I typically download cdnas directly from Ensembl (like with wget ftp://ftp.ensembl.org/pub/release-93/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz), build a kallisto index, and run kallisto quant to estimate isoform abundance.

However, Ensembl tends to provide very detailed transcript models. Furthermore, the provided cdna files from Ensembl also contain lots of non-coding biotypes from NMD to retained intron.

So I was wondering if a better practice would be filtering the provided cdna.fasta for just those transcripts with a CCDS id or filtering by biotype (such as "protein coding")?

As an example a ccds-filter would cut down the number of cdnas of https://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000077782 from 41 to 9.

How sensitive is kallisto with respect to overly complex/redundant gene architectures?

kallisto isoforms • 1.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 2727 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6