following Kallisto with DESeq2 using tximport package
1
9
Entering edit mode
8.8 years ago
Assa Yeroslaviz ★ 1.9k

Hi

I am trying to analyse the the results from kallisto with the help of deseq2. After a long search I have found this post. It mentions the package tximport, which I am trying to run now. I have ran the complete vignette without difficulties. But when I am trying to run my data, I get the error message

> txi <- tximport(files[1:3], type="kallisto", tx2gene=tx2gene, reader=read_tsv)
reading in files
1 2 3
transcripts missing genes: 173259
summarizing abundance
Error in split.default(1:nrow(m), f) :
  group length is 0 but data length > 0

The reason for that, is probably that my datais using the Ensembl transcript IDs for the kallisto files, while the tximport workflow assumes that the UCSC IDs are in place.

My files look like this:

target_id    length    eff_length    est_counts    tpm
ENST00000415118    8    2.33333    0    0
ENST00000448914    13    6    0    0
ENST00000434970    9    3.33333    0    0
ENST00000390577    37    12.3793    14    116.948
ENST00000437320    19    10.1667    0    0

while the list of genes from the tximport workflow is:

> head(df)
  GENEID     TXNAME
1      1 uc002qsd.4
2      1 uc002qsf.2
3     10 uc003wyw.1
4    100 uc002xmj.3
5   1000 uc010xbn.1
6   1000 uc002kwg.2

So I was wondering whether there is a better way of working with the package (in the vignette, a separate list with RefSeq Ids is uploded to fit the provided Kallisto files).

Is there another package besides TxDb.Hsapiens.UCSC.hg19.knownGene, where I can map my ENST IDs to ENSG or even to gene names?

I know I can use biomaRt (this is what I am doing now), but it takes a long time, as my list of transcripts is 173260 rows long.

Thanks

Assa

tximport kallisto deseq2 ucsc ensembl • 8.6k views
ADD COMMENT
0
Entering edit mode

which version of ensemble (v75 hg19) you are using? I do use have a workaround without using the package.

ADD REPLY
0
Entering edit mode

Hi!

I am also interested in a workaround as I m using ENSEMBL and the biomaRt is not working properly. Would you mind sharing your workaround? Im using hg38/ensembl v 83, though. But if you have a workaround for hg19, I can probably change it for my purposes.

Thanks!

ADD REPLY
2
Entering edit mode
8.8 years ago
Michael Love ★ 2.6k

hi Frymor,

A couple things:

tximport will be showing up on Bioconductor next week, so you can ask further questions on the Bioconductor support site (which runs on Biostars interface) and I will be notified to answer them.

You'll need to construct your own tx2gene table. The function needs to be able to group the tx id's to genes, so this requires that the names in target_id column match the names in the first column of tx2gene.

There is a code chunk in the vignette which shows how to build the tx2gene if you have a TxDb (Bioconductor object roughly equivalent to a GTF file).

You can certainly do this with biomaRt (which may be slow) or check out the ensembldb package:

http://bioconductor.org/packages/release/bioc/html/ensembldb.html

ADD COMMENT
0
Entering edit mode

Hi

I have tried to use this as it seems very easy with the prepared ensembl 75/79 package, but Im using hg38, unfortunately and am not able to get the openssl installed etc and build my own package with the API as mentioned in the vignette. Do you have any idea how I can get tximport to run when Im using ENsembl and not UCSC and ensembl db is not working as well as biomaRt?

Thanks in advance!

ADD REPLY
0
Entering edit mode

Sure, you can import any Ensembl GTF file to build a TxDb. See the makeTxDbFromGFF function in the GenomicFeatures package.

ADD REPLY

Login before adding your answer.

Traffic: 2761 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6