Getting gene counts from transcript counts using R package 'tximport'
1
0
Entering edit mode
11 months ago
arsala521 ▴ 60

Hi everyone,

I am working with RNA-seq data analysis. I want to convert transcript counts to gene counts. I have estimate counts of transcript using Kallisto and I want to use tximport R package to get counts for gene instead of transcripts.

Basically, I have an input tsv file like this:

ENST00000431440.2       0
ENST00000390583.1       1
ENST00000390584.1       5.5
ENST00000390585.1       2.5
ENST00000430425.1       0

and I want the output with gene ids in the first column and their counts in the 2nd column.

From the vignettes page of tximport R package here: (https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#kallisto), I very much got how to do this task using tximport but I am not understanding how to prepare the input object. I got that I have to use this:

txi.kallisto <- tximport(files, type = "kallisto", tx2gene = tx2gene, ignoreAfterBar = TRUE)

but what is files in this command and how can I prepare this input object in my situation. If someone can please help me with this.

TIA

tximport RNA-seq • 1.2k views
ADD COMMENT
1
Entering edit mode

This works for me, maybe you can try it:

###Locate the directory containing files generated by kallisto
base_directory <- "C:/Users/kallisto_quantification"
dirs <- list.files(base_directory)

###Create a vector 
quant_files <- list.files(base_directory, pattern = "abundance.tsv", recursive = TRUE, full.names = TRUE)

### Check the files
quant_files

### Verify that the files exist
all(file.exists(quant_files))

###
gtf_file <- "example.gtf"

### Check that the file exists
file.exists(gtf_file)

### Create a transcripts database using the GTF file with GenomicFeatures
txdb <- makeTxDbFromGFF(gtf_file)

###
keytypes(txdb)
columns(txdb)

###
k <- keys(txdb, keytype="TXNAME")
tx_map <- select(txdb, keys = k, columns="GENEID", keytype = "TXNAME")
head(tx_map)

### Save the tx2gene
tx2gene <- tx_map
write.csv(tx2gene, file = "tx2gene.csv", row.names = FALSE, quote = FALSE)
txi <- tximport(quant_files, type = "kallisto", tx2gene = tx2gene)
ADD REPLY
1
Entering edit mode
11 months ago
jv ★ 1.8k

files for tximport should be a character vector of filenames for the transcript-level abundances (as described in the function documentation/usage), i.e.

files <- c("sample_1.abundance.tsv", "sample_2.abundance.tsv")

One easy option is to have all of the abundance.tsv files in a directory and use list.files to read all the filenames into a vector.

ADD COMMENT
0
Entering edit mode

Got it. Thank you so much. I wasn't able to catch it from vignettes page as it was written in little complicated manner and my R knowledge is very basic, not advanced. Thanks again :)

ADD REPLY
1
Entering edit mode

A small educational note: if an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work. If an answer was not really helpful or did not work, provide detailed feedback so others know not to use that answer.
upvote_bookmark_accept

ADD REPLY

Login before adding your answer.

Traffic: 2279 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6