Dear All,
I'm Synat, a cancer research student. I'm doing RNAseq experiment looking at differential expression between treatment conditions and looking through pathway analysis too.
I'm doing the alignment using kallisto.. I first built index from fasta file and then mapping it to reference gennome using using index and fastq file as described in kallisto manual. With these two stages, i got abundance.h5 file, abundance.tsv file and fun_info file.
However, as I have been told, this is not the end of alignemnt yet as I got only target ID in the abundance.tsv or abundance.h5. Hence, I need another step, which is gene annotation (assigning gene with specific name) and that may need another gtf file.
In kallisto manual, it has been mentioned briefly about gft file and chromosom file,
https://pachterlab.github.io/kallisto/starting
However, I am not sure whether that was about gene annotation or serve other purposes.
I got a sample R code from my colleages with folder of each sample containing abundance.tsv, abundance.h5 and run_info and once I run those codes they are all worked and got nice gene annotation/name and finally generated csv file for further analysis. However, once I run those code on my files (abundance.tsv, abundance.h5), gene names were missing.
My question is how could I performed annotation in kallisto using gtf file and generate abundance.tsv, abundance.h5 and run_info files? I am quite new to the field and Hope my question makes sense to everyone in the forum and look forward to hearing from you all.
Regards,
Did you check the assembly version you are using for Kallisto and gene annotation, I mean is it consistent?
If you are getting ENS (ensemble ids) in Kallisto then I guess using biomaRt you can easily annotate them (with appropriate assembly). If your aim is to only find out all the ensemble ids and associated gene symbols then you can parse the
gtf
columns (ensemble ids
andgene symbols
) OR just use the online version of biomart to fetch all the annotations.Dear Nitin,
Thank you so much for your time responding my question.. I do appreciated that.. I am wondering whether you have a quick moment looking through my code when I did the alignment.. I am not quite sure about gene assembly and it seems that I did not put any gene assembly in my code. Appreciate if you could have a look
I will also try BioMaRT as you suggest and let you know how i go with it.. Have a good weekend.
Regards,
synat
You are using mouse assembly version
GRcM39 (mm39)
. Can you check this assembly version in your gene annotation script?Hi Nitin,
Thank for your response. Honestly, I have not done the gene annotation via BioMart in R yet as I am still reading the tutorial.
As you mentioned the online version of BioMart, did you mean that I can get the the fasta file with all the anotation rather the normal fasta file? You know the sequential step to get it? Really appreciated your help.
Kind Regards,
Synat,
I mean you can download ensemble
gene id
and associatedgene symbols
from BioMart. And I suppose your Kallisto abundance table will have ensemble ids (gene ids) and the abundance so in the next step, you can easily match these ids and fetch the associated gene symbols from BioMart downloaded table.Anyway, why do you want this information, I mean you can do pretty much everything with the ensemble ids, annotation or pathway, or go enrichment.
Hi Nitin,
Thank you so much for your assistance. I am looking for differential expression and pathway analysis too. a bit of exploratory analysis looking whether anything come up. so will explore a variety of things there.
Also, I had just fixed the issue as I just sourced fasta file from genecode and aligned with kallisto and got gene symbol/name there. so all good now. really appreciated your response. Hope you have a good day!
Kind Regards,
synat
Dear synat I have same problems too... Is it mean you created .idx file from .fasta file that include gene symbol/name? The thing i want to know is the file that you use to indexing.
It's too late but i need help thanks Lim