Question

using tximport to prepare salmon quantification files for DEG analysis

0

Entering edit mode

6.7 years ago

tarek.mohamed ▴ 370

Hi All,

I am using tximport to prepare quant.sf files generated from salmon for Deseq2 DEG analysis. My quant.sf and txgenes files looks fine. However I got a message telling me that I have 3468transcripts missing from tx2gene. I attached the code for tx2gene object generation. What is causing this?

library(biomaRt)
listMarts()
ensembl<-useMart("ensembl") 
listDatasets(ensembl) 
ensembl<-useDataset("hsapiens_gene_ensembl",mart=ensembl)
listFilters(ensembl)
ids<-read.delim(files[1],header = T,sep = "\t")
head(ids)
ids <- as.character(ids[,1])
ids
output<-getBM(filters="ensembl_transcript_id",
              attributes=c("ensembl_transcript_id","external_gene_name"),
              values=ids,mart=ensembl)

txi_test <- tximport("quant.sf", type="salmon",
                      tx2gene=output, dropInfReps=TRUE)

reading in files with read_tsv

1

transcripts missing from tx2gene: 3468

summarizing abundance

summarizing counts

summarizing length

#

 head(tnnt2neo_3uMdox_3uMdesp_rep2)
             Name Length EffectiveLength TPM NumReads
    1 ENST00000434970      9               2   0        0
    2 ENST00000448914     13               3   0        0
    3 ENST00000415118      8               2   0        0
    4 ENST00000632684     12               3   0        0
    5 ENST00000631435     12               3   0        0
    6 ENST00000430425     17               3   0        0

#

head(output)
  ensembl_transcript_id external_gene_name
1       ENST00000337114             PKD1L2
2       ENST00000358097             CYP2D7
3       ENST00000375726             CASP12
4       ENST00000377032          IGKV1D-12
5       ENST00000377712              NAT8B
6       ENST00000379435        TRBV20OR9-2

RNA-Seq tximport • 4.7k views

ADD COMMENT • link updated 5.7 years ago by Biostar 20 • written 6.7 years ago by tarek.mohamed ▴ 370

0

Entering edit mode

You may use setdiff() to investigate the missing entries:

setdiff( output$ensembl_transcript_id, tnnt2neo_3uMdox_3uMdesp_rep2$Name )

setdiff( tnnt2neo_3uMdox_3uMdesp_rep2$Name, output$ensembl_transcript_id )

Or use [ , 1 ], in case $ fails:

setdiff( output[ , 1 ], tnnt2neo_3uMdox_3uMdesp_rep2[ , 1 ] )

setdiff( tnnt2neo_3uMdox_3uMdesp_rep2[ , 1 ], output[ , 1 ] )

This would tell the missing transcripts, but not why they are missing. I guess the cause lies in how you created the tx2gene output table - which you didn't tell us.

EDIT: am I going crazy and I didn't see the getBM() code, or did you add it after my comment?

Anyway, what is the version of your reference transcriptome? How did you create it? Did you check the ids of the missing transcripts?

ADD REPLY • link 6.7 years ago by h.mon 35k