Entering edit mode
6.7 years ago
tarek.mohamed
▴
370
Hi All,
I am using tximport to prepare quant.sf files generated from salmon for Deseq2 DEG analysis. My quant.sf and txgenes files looks fine. However I got a message telling me that I have 3468transcripts missing from tx2gene. I attached the code for tx2gene object generation. What is causing this?
library(biomaRt)
listMarts()
ensembl<-useMart("ensembl")
listDatasets(ensembl)
ensembl<-useDataset("hsapiens_gene_ensembl",mart=ensembl)
listFilters(ensembl)
ids<-read.delim(files[1],header = T,sep = "\t")
head(ids)
ids <- as.character(ids[,1])
ids
output<-getBM(filters="ensembl_transcript_id",
attributes=c("ensembl_transcript_id","external_gene_name"),
values=ids,mart=ensembl)
txi_test <- tximport("quant.sf", type="salmon",
tx2gene=output, dropInfReps=TRUE)
reading in files with read_tsv
1
transcripts missing from tx2gene: 3468
summarizing abundance
summarizing counts
summarizing length
#
head(tnnt2neo_3uMdox_3uMdesp_rep2)
Name Length EffectiveLength TPM NumReads
1 ENST00000434970 9 2 0 0
2 ENST00000448914 13 3 0 0
3 ENST00000415118 8 2 0 0
4 ENST00000632684 12 3 0 0
5 ENST00000631435 12 3 0 0
6 ENST00000430425 17 3 0 0
#
head(output)
ensembl_transcript_id external_gene_name
1 ENST00000337114 PKD1L2
2 ENST00000358097 CYP2D7
3 ENST00000375726 CASP12
4 ENST00000377032 IGKV1D-12
5 ENST00000377712 NAT8B
6 ENST00000379435 TRBV20OR9-2
You may use
setdiff()
to investigate the missing entries:Or use
[ , 1 ]
, in case$
fails:This would tell the missing transcripts, but not why they are missing. I guess the cause lies in how you created the tx2gene
output
table - which you didn't tell us.EDIT: am I going crazy and I didn't see the
getBM()
code, or did you add it after my comment?Anyway, what is the version of your reference transcriptome? How did you create it? Did you check the ids of the missing transcripts?