Hi
I am trying to analyse the the results from kallisto with the help of deseq2. After a long search I have found this post. It mentions the package tximport, which I am trying to run now. I have ran the complete vignette without difficulties. But when I am trying to run my data, I get the error message
> txi <- tximport(files[1:3], type="kallisto", tx2gene=tx2gene, reader=read_tsv)
reading in files
1 2 3
transcripts missing genes: 173259
summarizing abundance
Error in split.default(1:nrow(m), f) :
group length is 0 but data length > 0
The reason for that, is probably that my datais using the Ensembl transcript IDs for the kallisto files, while the tximport workflow assumes that the UCSC IDs are in place.
My files look like this:
target_id length eff_length est_counts tpm
ENST00000415118 8 2.33333 0 0
ENST00000448914 13 6 0 0
ENST00000434970 9 3.33333 0 0
ENST00000390577 37 12.3793 14 116.948
ENST00000437320 19 10.1667 0 0
while the list of genes from the tximport workflow is:
> head(df)
GENEID TXNAME
1 1 uc002qsd.4
2 1 uc002qsf.2
3 10 uc003wyw.1
4 100 uc002xmj.3
5 1000 uc010xbn.1
6 1000 uc002kwg.2
So I was wondering whether there is a better way of working with the package (in the vignette, a separate list with RefSeq Ids is uploded to fit the provided Kallisto files).
Is there another package besides TxDb.Hsapiens.UCSC.hg19.knownGene
, where I can map my ENST IDs to ENSG or even to gene names?
I know I can use biomaRt (this is what I am doing now), but it takes a long time, as my list of transcripts is 173260 rows long.
Thanks
Assa
which version of ensemble (v75 hg19) you are using? I do use have a workaround without using the package.
Hi!
I am also interested in a workaround as I m using ENSEMBL and the biomaRt is not working properly. Would you mind sharing your workaround? Im using hg38/ensembl v 83, though. But if you have a workaround for hg19, I can probably change it for my purposes.
Thanks!