Question

Getting gene counts from transcript counts using R package 'tximport'

0

Entering edit mode

14 months ago

arsala521 ▴ 60

Hi everyone,

I am working with RNA-seq data analysis. I want to convert transcript counts to gene counts. I have estimate counts of transcript using Kallisto and I want to use tximport R package to get counts for gene instead of transcripts.

Basically, I have an input tsv file like this:

ENST00000431440.2       0
ENST00000390583.1       1
ENST00000390584.1       5.5
ENST00000390585.1       2.5
ENST00000430425.1       0

and I want the output with gene ids in the first column and their counts in the 2nd column.

From the vignettes page of tximport R package here: (https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#kallisto), I very much got how to do this task using tximport but I am not understanding how to prepare the input object. I got that I have to use this:

txi.kallisto <- tximport(files, type = "kallisto", tx2gene = tx2gene, ignoreAfterBar = TRUE)

but what is files in this command and how can I prepare this input object in my situation. If someone can please help me with this.

TIA

tximport RNA-seq • 1.4k views

ADD COMMENT • link updated 14 months ago by Ram 44k • written 14 months ago by arsala521 ▴ 60

1

Entering edit mode

This works for me, maybe you can try it:

###Locate the directory containing files generated by kallisto
base_directory <- "C:/Users/kallisto_quantification"
dirs <- list.files(base_directory)

###Create a vector 
quant_files <- list.files(base_directory, pattern = "abundance.tsv", recursive = TRUE, full.names = TRUE)

### Check the files
quant_files

### Verify that the files exist
all(file.exists(quant_files))

###
gtf_file <- "example.gtf"

### Check that the file exists
file.exists(gtf_file)

### Create a transcripts database using the GTF file with GenomicFeatures
txdb <- makeTxDbFromGFF(gtf_file)

###
keytypes(txdb)
columns(txdb)

###
k <- keys(txdb, keytype="TXNAME")
tx_map <- select(txdb, keys = k, columns="GENEID", keytype = "TXNAME")
head(tx_map)

### Save the tx2gene
tx2gene <- tx_map
write.csv(tx2gene, file = "tx2gene.csv", row.names = FALSE, quote = FALSE)
txi <- tximport(quant_files, type = "kallisto", tx2gene = tx2gene)

ADD REPLY • link updated 14 months ago by Ram 44k • written 14 months ago by san96 ▴ 170

score 1 · Answer 1 · 2023-12-16

1

Entering edit mode

14 months ago

jv ★ 1.8k

files for tximport should be a character vector of filenames for the transcript-level abundances (as described in the function documentation/usage), i.e.

files <- c("sample_1.abundance.tsv", "sample_2.abundance.tsv")

One easy option is to have all of the abundance.tsv files in a directory and use list.files to read all the filenames into a vector.

ADD COMMENT • link 14 months ago by jv ★ 1.8k

0

Entering edit mode

Got it. Thank you so much. I wasn't able to catch it from vignettes page as it was written in little complicated manner and my R knowledge is very basic, not advanced. Thanks again :)

ADD REPLY • link 14 months ago by arsala521 ▴ 60

1

Entering edit mode

A small educational note: if an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work. If an answer was not really helpful or did not work, provide detailed feedback so others know not to use that answer.
upvote_bookmark_accept

ADD REPLY • link 14 months ago by Ram 44k