Question

DE analysis from kallisto estimated counts (only)

0

Entering edit mode

2 days ago

Lila M ★ 1.3k

Hello all, I have some 'raw' data from kallisto, and my aim is to perform a DE analysis in to conditions.

I have a count table from kallisto (estimated counts) like this:

head(x)
target_id  length sample1  sample2 sample3 
  <chr>       <dbl>                  <dbl>                  <dbl>                  <dbl>          
1 ENST00000…      8                      0                      0                      0                     
2 ENST00000…      9                      0                      0                      0                     
3 ENST00000…     13                      0                      0                      0                    
4 ENST00000…     23                      0                      0                      0

and a effective length table from kallisto like this:

I don't have any other files to be used to generate the txi table, so

head(y)
# A tibble: 6 × 427
  target_id  length sample1  sample2  sample3  
  <chr>       <dbl>                  <dbl>                  <dbl>                  <dbl>                  
1 ENST00000…      8                   9                       7.5                   9                     
2 ENST00000…      9                  10                       8.5                  10                
3 ENST00000…     13                   1                      12.5                   3.5                   
4 ENST00000…     23                   5.88                   13.5                   6.33      
5 ENST00000…     19                   4.4                    12.7                   7         
6 ENST00000…     31                  10.6                    12.2                  11

I know this data is not enough to perform the analysis properly, but I am considering if by using limma-voom and use the counts provided as a tx$count, and follow this approach:

First approach using limma-voom:

library(edgeR)
library(biomaRt)

#generate tx2gene: 

ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")  # Use appropriate dataset
transcripts_list <- rownames(counts) 

tx2gene <- getBM(
      attributes = c("ensembl_transcript_id", "ensembl_gene_id", "hgnc_symbol"),  # Attributes you want to retrieve
      filters = "ensembl_transcript_id",  # Filter by transcript ID
      values = transcripts_list,  # Your list of transcript IDs
      mart = ensemble)

counts1 <- left_join(counts, tx2gene)

#aggregate the results as a gene-count table: 
gene_counts <- counts1 %>%
group_by(hgnc_symbol) %>% 
summarise(across(starts_with("count"), sum, na.rm = TRUE))

#perform Limma-vool
y <- DGEList(gene_counts)
design <- model.matrix(~ 0 + matrix$Group) 
colnames(design) <- levels(matrix$Group)
keep <- filterByExpr(y, design)
y <- y[keep, ]
y <- calcNormFactors(y)
v <- voom(y, design)
fit <- lmFit(v, design)
contrast.matrix <- makeContrasts(G1 - G2, levels = design)
fit2 <- contrasts.fit(fit, counts1)
fit2 <- eBayes(fit2)
res <- topTable(fit2, adjust="fdr", number = Inf)

Does anyone have any other advice/feedback about how to perform an analysis using this data? Do you think that approach would be correct for an exploratory analysis or should I just not use this data?

Thank you very much

deviation DE limma voom • 379 views

ADD COMMENT • link 7 hours ago by Lila M ★ 1.3k

0

Entering edit mode

This looks like edgeR not limma-voom. Anyway, what exactly is the problem here?

ADD REPLY • link 1 day ago by dsull ★ 6.9k

0

Entering edit mode

I just wanted to make sure if this approach is correct considering the inputs I am providing or if there is a different alternative I could use

Thanks!

ADD REPLY • link 1 day ago by Lila M ★ 1.3k

0

Entering edit mode

my recommendation is that you produce files with counts first then do the differential expression so that you don't have to keep redoing that step.

basically summarize counts by gene, save that into a file, and that would be script 1.

now in a new script, load these new counts and do the differential expression there,

that way you don't have redo that step each time you run the differential expression, now you can explore the effect of filtering rows based on minimal counts

you should also investigate these new resulting summarized counts to make sure they make sense

ADD REPLY • link 23 hours ago by Istvan Albert 102k

0

Entering edit mode

That is basically what I did, but as I haven't generated the counts (and they are per transcripts) I just summarised by gene counts and then I made the DE. My only concern is that the counts are the raw counts from Kallisto and I don't have the other required files to load and get the gene counts by using tximpot

ADD REPLY • link 20 hours ago by Lila M ★ 1.3k

0

Entering edit mode

There are no other "required files". The only files you need are the raw kallisto counts.

ADD REPLY • link 17 hours ago by dsull ★ 6.9k

0

Entering edit mode

I understand that to create the gene counts from kallisto you also need the abundance.tsv file, and then generate the gene counts from them and tx2gene.

ADD REPLY • link 7 hours ago by Lila M ★ 1.3k