Problem with tximport and plasmodium falciparum
2
0
Entering edit mode
22 months ago
bioinfo ▴ 150

Hello,

I aligned my samples with kallisto to a transcriptome for plasmodium falciparum. The file I used to make the reference is Plasmodium_falciparum.ASM276v2.cdna.all.fa.gz which I downloaded from here http://ftp.ensemblgenomes.org/pub/protists/release55/fasta/plasmodium_falciparum/cdna/Plasmodium_falciparum.ASM276v2.cdna.all.fa.gz.

However, I am having issues with tximport.

The error that I get is:

Error in .local(object, ...) : 
  None of the transcripts in the quantification files are present
  in the first column of tx2gene. Check to see that you are using
  the same annotation for both.

Example IDs (file): [CAX64123, CAX64256, CZT99967, ...]

Example IDs (tx2gene): [CAD49011., CAD48976., CAD49073., ...]

  This can sometimes (not always) be fixed using 'ignoreTxVersion' or 'ignoreAfterBar'.

I understand that the problem seems to be on the mart object I created and that maybe I am getting a different version. However, I think that the problem is the external gene name. I see on the mart object that it is an attribute but when I add it to t2g the column is empty. Has anyone had that issue before?

My script is below:

mart <- biomaRt::useMart("protists_mart", host= "https://protists.ensembl.org", "pfalciparum_eg_gene")
t2g <- biomaRt::getBM(attributes = c("ensembl_transcript_id", "ensembl_gene_id", "external_gene_name"), mart = mart)
 t2g <- dplyr::rename( t2g, gene_symbol = external_gene_name)
t2g<-t2g[,c(ncol(t2g),1:(ncol(t2g)-1))]

accessions <- list.dirs(full.names=FALSE)[-1]
kallisto.dir<-paste0(accessions)
tsv_files<-file.path(kallisto.dir,"abundance.tsv") #can also be abundance.tsv
names(kallisto.files)<- accessions
tx.kallisto <- tximport(kallisto.files, type = "kallisto", tx2gene = t2g)

Thank you

tximport • 1.4k views
ADD COMMENT
2
Entering edit mode
22 months ago
bioinfo ▴ 150

I figured out that most of the external gene name column was empty so that is why it was not working. I ended up just using the ensembl_gene_id and it works fine now.

ADD COMMENT
0
Entering edit mode
22 months ago
ATpoint 86k

Try to remove that dot from the t2g names.

gsub("\\..*", ""', t2g[,1])
ADD COMMENT
0
Entering edit mode

Thank you. It does not seem to be that. I changed my script a bit. Now it looks like shown below:

mart <- biomaRt::useMart("protists_mart", host= "https://protists.ensembl.org", "pfalciparum_eg_gene")
t2g <- biomaRt::getBM(attributes = c("ensembl_transcript_id", "ensembl_gene_id", "external_gene_name"), mart = mart)
#t2g <- biomaRt::getBM(attributes = c("ensembl_transcript_id", "ensembl_gene_id"), mart = mart)
t2g <- dplyr::rename( t2g, gene_symbol = external_gene_name)

accessions <- list.dirs(full.names=FALSE)[-1]
kallisto.dir<-paste0(accessions)
tsv_files<-file.path(kallisto.dir,"abundance.tsv") #can also be abundance.tsv
names(kallisto.files)<- accessions
tx.kallisto <- tximport(kallisto.files, type = "kallisto", tx2gene = t2g)

If I use the script as shown above then for the counts in the tx.kallisto object I just get one number. If I comment the second line out and use the 3rd line for the getBM attributes I do get a file with the ensembl gene IDs. It seems to be something with the external gene name causing the problem.

ADD REPLY
0
Entering edit mode

You do not do what I suggested above.

ADD REPLY
0
Entering edit mode

I figured out that most of the external gene name column was empty so that is why it was not working. I ended up just using the ensembl_gene_id and it works fine now. Thank you

ADD REPLY

Login before adding your answer.

Traffic: 1930 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6