Hi There, I am a student and new to the field of bioinformatics.
Recently I have started working on RNA seq data using Common carp (Cyprinus carpio) transcriptome. I have downloaded my reference genome and annotation from NCBI, as common carp is not a listed organism in most of the pipeline. However, after performing a differential analysis I have list of genes which is the locus ID (I think), which looks like LOC109056034,LOC109079975, LOC109101305 etc. If I search these ID in NCBI they are coming up as predicted genes with specific character!!
My final goal is to construct a molecular pathway analysis. I was trying to use this ID in GO, AmiGO or DAVID, but it looks like none of them can recognise my IDs. Also my organism is not in their species list. My question is- 1) How can I perform a pathway analysis and what tools (online tools if possible) should be easier or suitable for non model organism? 2) Is it a common practice to perform pathway analysis in a related species (e.g. Zebrafish) using the same gene, as most of the common genes have conserved function? 3) If I have to convert my gene IDs, whats the best way to do it? If I have to assign a code name of the genes, whats the best way to do it? 4) Is there a step by step protocol?
I am very sorry for the lengthy questions. Can someone share some light on it?
Regards, Raihan
If any of the above posts helped you solving the problem, please upvote to encourage the people. Otherwise if you found some alternate solution, you are welcome to share here (only if you are willing to share).
Yes, as EagleEyes said, I'm also very interested in the end of the question. If it is solved maybe I can follow the same route.
I'm also blocked in a similar issue. I have a transcriptome of a mollusk specie but I can't do the GO analysis. I didn't find a proper solution to my needs. I'm working with a non-model organism and I have annotated my transcriptome against a homemade database composed by 9 proteomes. My proteomes were downloaded from UniProt, so using the format converter of UniProt I have obtained the GO terms assigned to each entry of my transcriptome.
As in-put to a GO analysis I have an annotated transcriptome (protein based annotation) and the related GO terms, but the tools which I have found ask always for a related species (like GeneSCF). It is possible to do that in other way?
And just to clarify, if that is not possible, in order to follow the GeneSCF solution (or similar) I have "yes or yes" to "re-annotate" my transcriptome against the organism which I'll use as base to my GO analysis? for example, if I chose Danio rerio cause is well annotated and at least a marine organisms I have to blast my transcriptome against that species in order to share the same gene names?
Thank you for your time.
Hi pablo61991,
As Farbod mentioned in his query (here). You collect the genes from the closest species ('Danio rerio') and use those gene list in GeneSCF.
Thank you for your anwser EagleEye,
yes, I have read this topic but I was interested in an alternative to could use several species during my annotation instead of one. However, what I have found here (you and others) and in other forums/web resources make me think there isn't a good alternative to that approach that you highlight.
What do you think about use this approach to perform the GO analysis but then use the other annotation file to perform, for example, the DGE analysis? I think it wont be correct but as I have a short experience in this field I have to ask (sorry if it sounds like a stupid question).
Thank you again for your time.
It is bit unclear what you are trying to ask
'What do you think about use this approach to perform the GO analysis but then use the other annotation file to perform, for example, the DGE analysis?'
please elaborate?
Sorry, I'll try to explain myself better.
I don't know if it's possible to annotate my transcriptome vs Danio rerio database (mRNA or protein), then perform the GO analysis and finally, in order to obtain the list of genes which I need to perform the DGE analysis in DESeq2 (I follow the: kallisto>tximport >DESeq2 pipeline), generate other annotation based in a broader (homemade) database composed by more organisms.
I have this doubt because when I have annotated my transcriptome vs D. rerio I have obtained a 30% less of hits than when I did it vs my homemade database. So, in order to perform the DGE I'll lose information because some of the no-hits which I had vs D. rerio are in fact hits vs my homemade database.
I will suggest you to do it in following order,
Now you will have,
Perfect, I'll apply this approach. As soon as I have done it successfully, I'll comeback to vote this answer.
Thank you.