Discrepancy between abundance.tsv and tx2gene.csv
1
0
Entering edit mode
7.0 years ago
Mozart ▴ 330

So I am testing the Kallisto/DESeq2 pipeline and I am now struggling with tximport as I need to manage the tables obtained in the analysis carried out so far prior to launch DESeq2. For each sample I have an abundance.tsv file and I need to combine(?) it with the .csv file that I created ad hoc (with known genes/transcript correlations). So far, there's a sort of discrepancy with the annotation process as for example in my abundance file I have something like this:

ENSMUST00000103493.2

but I would like to obtain something like this

ENSMUST00000103493

in order to be recognised in my transcript2gene.csv file.

Here's my strings of code:

dir <- system.file("extdata", package = "tximportData")
list.files(dir)
samples <- read.table(file.path(dir, "samples.txt"), header = TRUE)
library(GenomicFeatures)

txdb <-txdb <- select(org.Mm.eg.db, keys(org.Mm.eg.db), "ACCNUM") 
txdb
k <- keys(txdb, keytype = "GENEID")
k
df <- select(txdb, keys = k, keytype = "GENEID", columns = "TXNAME")
df

'select()' returned 1:many mapping between keys and columns

tx2gene <- df[, 2:1]
head(tx2gene)

#  TXNAME             GENEID
#1 ENSMUST00000000001 ENSMUSG00000000001
#2 ENSMUST00000000003 ENSMUSG00000000003
#3 ENSMUST00000114041 ENSMUSG00000000003
#4 ENSMUST00000000028 ENSMUSG00000000028
#5 ENSMUST00000096990 ENSMUSG00000000028
#6 ENSMUST00000115585 ENSMUSG00000000028

then I write the results as a csv file

write.csv(tx2gene, file = "/tx2gene.csv")

files <- file.path(dir, "kallisto", samples$run, "abundance.tsv")
names(files) <- paste0("sample", 1:6)
txi.kallisto.tsv <- tximport(files, type = "kallisto", tx2gene = tx2gene)
head(txi.kallisto.tsv$counts)

Note: importing `abundance.h5` is typically faster than `abundance.tsv`
reading in files with read_tsv
1 2 3 4 5 6 
Error in summarizeToGene(txi, tx2gene, ignoreTxVersion, countsFromAbundance) : 

  None of the transcripts in the quantification files are present
  in the first column of tx2gene. Check to see that you are using
  the same annotation for both.

Any useful hints?

RNA-Seq • 3.2k views
ADD COMMENT
3
Entering edit mode
7.0 years ago
erwan.scaon ▴ 950

If you want to convert ENSMUST00000103493.2 -> ENSMUST00000103493 in your Kallisto abundance.tsv files, you can do the following :

for f in *.tsv;
do awk -F '\t' -v OFS='\t' 'NR > 1 {sub(/\.[0-9]*/, "", $1)} 1' $f > ${f%%.*}"_renamed.tsv";
done;
ADD COMMENT
0
Entering edit mode

That's perfect. I solved my problem, thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1803 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6