Hi guys, I'm trying to analyze some RNA-seq data using salmon as follow:
#create the index:
salmon index -t gencode.v27.transcripts.fa -i human_index
#cretae the quant.sf files:
salmon quant -i human_index/ -l OSR -1 R1.fastq -2 R2.fastq -o salmon_quant
After that, my idea is to process all the files (1Q_S1_quant.sf, 2Q_S2_quant.sf .....16Q_S16_quant.sf) in R for downstream analysis with DESeq2, to do that I've tried:
library(GenomicFeatures)
library(tximport)
library(readr)
library(rjson)
## Create a transcript-to-gene matching table (tx2gene) that will be used to aggregate transcript quantifications
## Salmon to the gene level
txdb <-makeTxDbFromGFF("gencode.v27.annotation.gtf")
k <- keys(txdb, keytype = "GENEID")
df <- select(txdb, keys = k, columns = "TXNAME", keytype = "GENEID")
tx2gene <- df[, 2:1]
head(tx2gene)
## load salmon files
files <- list.files( pattern = "quant.sf",full.names = TRUE)
names(files) <- paste0("sample", 1:16)
all(file.exists(files))
#TRUE
txi_salmon <- tximport(files = files, type = "salmon", txOut = FALSE, tx2gene = tx2gene)reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Error in summarizeToGene(txi, tx2gene, ignoreTxVersion, countsFromAbundance) :
None of the transcripts in the quantification files are present
in the first column of tx2gene. Check to see that you are using
the same annotation for both.
But that is not true at all, because I look in both files (quant.sf and tx2gene) and the same transcript for the same gene is present in both files (eg):
#tx2gene
TXNAME GENEID
ENST00000373031.4 ENSG00000000005.5
ENST00000485971.1 ENSG00000000005.5
#1Q_S1.quant.sf
ENST00000373031.4|ENSG00000000005.5|OTTHUMG00000022001.1|OTTHUMT00000057481.1|TNMD-201|TNMD|1339|protein_coding| 1339 1156.86 0 0
ENST00000485971.1|ENSG00000000005.5|OTTHUMG00000022001.1|OTTHUMT00000057482.1|TNMD-202|TNMD|542|processed_transcript| 542 360.895 0 0
Any suggestions about what's going on with this funny error?
Thanks!
Hint: compare the first columns of the two files your posted. You'll note that they're not exactly the same. That's causing the error.
Hi Devon, can you explain how can I solve it? Thanks!
You can probably do something like
sed -e 's/\|.*\t/\t/' 1Q_S1.quant.sf
.