I'm currently trying to output differentially expressed (DE) genes from mm10 RNA-seq data. I did in parallel a genomic approche (Star -> featureCounts -> DESeq2 -> Heatmap) and a transcriptomic approche (Kallisto -> DESeq2 -> Heatmap) to intersect my results.
My results tend to overlap, except a gene which is highly expressed in Kallisto but not in featureCounts.
This gene is a predicted gene, Gm4737. The counts for this gene, after librairies size normalization are :
dds <- DESeq(dds)
dds_norm <- counts(dds, normalized=TRUE)
Approche/Condition A1 B1 C1 A2 B2 C2 A3 B3 C3
Kallisto 0 1019 0 0 1416 1226 0 1209 34
FeatureCounts 306 581 295 230 531 501 138 457 249
My reference genome :
url='ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M16';
axel -q $url/GRCm38.p5.genome.fa.gz;
My reference transcriptome :
url='ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M16';
axel -q $url/gencode.vM16.transcripts.fa.gz;
My annotation file :
url='ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M16';
axel -q $url/gencode.vM16.chr_patch_hapl_scaff.annotation.gtf.gz;
I used the same annotation file in both approche.
Any ideas about the huge counts difference between the two approches ? Does kallisto has problems with predicted genes ?
Thanks !
It is interesting that you have a specific example but it may not be surprising due to fundamental difference between methodologies, alignment (STAR) and mapping (kallisto) (See A: Alignment and mapping ).
You seems to be right, according to my IGV visualization. I checked at my pseudoalignments from kallisto 0.44.0, on Gm4737 positions, for A1 condition, there is 0 read mapped, whereas, for aligned reads (Star), on Gm4737 positions, for A1 condition, I got some reads aligned. And, for pseudoalignments, on Gm4737 positions, for B1 condition, there are a lot of reads mapped, which correlates with my counts table above.
Even if the two approches are different, I am still stuned by the colossal difference of counts that are generated on some genes while others are quite similar...
Thanks for the link !
Forgot about this post: Mapping to a transcriptome can incorrectly report reads as mapping uniquely.