using tximport to prepare salmon quantification files for DEG analysis
0
0
Entering edit mode
6.6 years ago
tarek.mohamed ▴ 370

Hi All,

I am using tximport to prepare quant.sf files generated from salmon for Deseq2 DEG analysis. My quant.sf and txgenes files looks fine. However I got a message telling me that I have 3468transcripts missing from tx2gene. I attached the code for tx2gene object generation. What is causing this?

library(biomaRt)
listMarts()
ensembl<-useMart("ensembl") 
listDatasets(ensembl) 
ensembl<-useDataset("hsapiens_gene_ensembl",mart=ensembl)
listFilters(ensembl)
ids<-read.delim(files[1],header = T,sep = "\t")
head(ids)
ids <- as.character(ids[,1])
ids
output<-getBM(filters="ensembl_transcript_id",
              attributes=c("ensembl_transcript_id","external_gene_name"),
              values=ids,mart=ensembl)

txi_test <- tximport("quant.sf", type="salmon",
                      tx2gene=output, dropInfReps=TRUE)

reading in files with read_tsv

1

transcripts missing from tx2gene: 3468

summarizing abundance

summarizing counts

summarizing length

#

 head(tnnt2neo_3uMdox_3uMdesp_rep2)
             Name Length EffectiveLength TPM NumReads
    1 ENST00000434970      9               2   0        0
    2 ENST00000448914     13               3   0        0
    3 ENST00000415118      8               2   0        0
    4 ENST00000632684     12               3   0        0
    5 ENST00000631435     12               3   0        0
    6 ENST00000430425     17               3   0        0

#

head(output)
  ensembl_transcript_id external_gene_name
1       ENST00000337114             PKD1L2
2       ENST00000358097             CYP2D7
3       ENST00000375726             CASP12
4       ENST00000377032          IGKV1D-12
5       ENST00000377712              NAT8B
6       ENST00000379435        TRBV20OR9-2
RNA-Seq tximport • 4.6k views
ADD COMMENT
0
Entering edit mode

You may use setdiff() to investigate the missing entries:

setdiff( output$ensembl_transcript_id, tnnt2neo_3uMdox_3uMdesp_rep2$Name )

setdiff( tnnt2neo_3uMdox_3uMdesp_rep2$Name, output$ensembl_transcript_id )

Or use [ , 1 ], in case $ fails:

setdiff( output[ , 1 ], tnnt2neo_3uMdox_3uMdesp_rep2[ , 1 ] )

setdiff( tnnt2neo_3uMdox_3uMdesp_rep2[ , 1 ], output[ , 1 ] )

This would tell the missing transcripts, but not why they are missing. I guess the cause lies in how you created the tx2gene output table - which you didn't tell us.

EDIT: am I going crazy and I didn't see the getBM() code, or did you add it after my comment?

Anyway, what is the version of your reference transcriptome? How did you create it? Did you check the ids of the missing transcripts?

ADD REPLY

Login before adding your answer.

Traffic: 1660 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6