DE analysis from kallisto estimated counts (only)
0
0
Entering edit mode
2 days ago
Lila M ★ 1.3k

Hello all, I have some 'raw' data from kallisto, and my aim is to perform a DE analysis in to conditions.

I have a count table from kallisto (estimated counts) like this:

head(x)
target_id  length sample1  sample2 sample3 
  <chr>       <dbl>                  <dbl>                  <dbl>                  <dbl>          
1 ENST00000…      8                      0                      0                      0                     
2 ENST00000…      9                      0                      0                      0                     
3 ENST00000…     13                      0                      0                      0                    
4 ENST00000…     23                      0                      0                      0            

and a effective length table from kallisto like this:

I don't have any other files to be used to generate the txi table, so

head(y)
# A tibble: 6 × 427
  target_id  length sample1  sample2  sample3  
  <chr>       <dbl>                  <dbl>                  <dbl>                  <dbl>                  
1 ENST00000…      8                   9                       7.5                   9                     
2 ENST00000…      9                  10                       8.5                  10                
3 ENST00000…     13                   1                      12.5                   3.5                   
4 ENST00000…     23                   5.88                   13.5                   6.33      
5 ENST00000…     19                   4.4                    12.7                   7         
6 ENST00000…     31                  10.6                    12.2                  11      

I know this data is not enough to perform the analysis properly, but I am considering if by using limma-voom and use the counts provided as a tx$count, and follow this approach:

First approach using limma-voom:

library(edgeR)
library(biomaRt)

#generate tx2gene: 

ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")  # Use appropriate dataset
transcripts_list <- rownames(counts) 

tx2gene <- getBM(
      attributes = c("ensembl_transcript_id", "ensembl_gene_id", "hgnc_symbol"),  # Attributes you want to retrieve
      filters = "ensembl_transcript_id",  # Filter by transcript ID
      values = transcripts_list,  # Your list of transcript IDs
      mart = ensemble)

counts1 <- left_join(counts, tx2gene)

#aggregate the results as a gene-count table: 
gene_counts <- counts1 %>%
group_by(hgnc_symbol) %>% 
summarise(across(starts_with("count"), sum, na.rm = TRUE))

#perform Limma-vool
y <- DGEList(gene_counts)
design <- model.matrix(~ 0 + matrix$Group) 
colnames(design) <- levels(matrix$Group)
keep <- filterByExpr(y, design)
y <- y[keep, ]
y <- calcNormFactors(y)
v <- voom(y, design)
fit <- lmFit(v, design)
contrast.matrix <- makeContrasts(G1 - G2, levels = design)
fit2 <- contrasts.fit(fit, counts1)
fit2 <- eBayes(fit2)
res <- topTable(fit2, adjust="fdr", number = Inf)

Does anyone have any other advice/feedback about how to perform an analysis using this data? Do you think that approach would be correct for an exploratory analysis or should I just not use this data?

Thank you very much

deviation DE limma voom • 435 views
ADD COMMENT
0
Entering edit mode

This looks like edgeR not limma-voom. Anyway, what exactly is the problem here?

ADD REPLY
0
Entering edit mode

I just wanted to make sure if this approach is correct considering the inputs I am providing or if there is a different alternative I could use

Thanks!

ADD REPLY
0
Entering edit mode

my recommendation is that you produce files with counts first then do the differential expression so that you don't have to keep redoing that step.

basically summarize counts by gene, save that into a file, and that would be script 1.

now in a new script, load these new counts and do the differential expression there,

that way you don't have redo that step each time you run the differential expression, now you can explore the effect of filtering rows based on minimal counts

you should also investigate these new resulting summarized counts to make sure they make sense

ADD REPLY
0
Entering edit mode

That is basically what I did, but as I haven't generated the counts (and they are per transcripts) I just summarised by gene counts and then I made the DE. My only concern is that the counts are the raw counts from Kallisto and I don't have the other required files to load and get the gene counts by using tximpot

ADD REPLY
0
Entering edit mode

There are no other "required files". The only files you need are the raw kallisto counts.

ADD REPLY
0
Entering edit mode

I understand that to create the gene counts from kallisto you also need the abundance.tsv file, and then generate the gene counts from them and tx2gene.

ADD REPLY
0
Entering edit mode

The abundance.tsv is the raw kallisto output -- is there any reason that you can't get the abundance.tsv files? Running kallisto automatically gives you those files...

Otherwise, if not, what you're doing (i.e. manually summing up the counts) is probably a good enough approximation.

ADD REPLY

Login before adding your answer.

Traffic: 2420 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6