Gene pathway analysis
4
0
Entering edit mode
7.5 years ago
bioraihan • 0

Hi There, I am a student and new to the field of bioinformatics.

Recently I have started working on RNA seq data using Common carp (Cyprinus carpio) transcriptome. I have downloaded my reference genome and annotation from NCBI, as common carp is not a listed organism in most of the pipeline. However, after performing a differential analysis I have list of genes which is the locus ID (I think), which looks like LOC109056034,LOC109079975, LOC109101305 etc. If I search these ID in NCBI they are coming up as predicted genes with specific character!!

My final goal is to construct a molecular pathway analysis. I was trying to use this ID in GO, AmiGO or DAVID, but it looks like none of them can recognise my IDs. Also my organism is not in their species list. My question is- 1) How can I perform a pathway analysis and what tools (online tools if possible) should be easier or suitable for non model organism? 2) Is it a common practice to perform pathway analysis in a related species (e.g. Zebrafish) using the same gene, as most of the common genes have conserved function? 3) If I have to convert my gene IDs, whats the best way to do it? If I have to assign a code name of the genes, whats the best way to do it? 4) Is there a step by step protocol?

I am very sorry for the lengthy questions. Can someone share some light on it?

Regards, Raihan

rna-seq • 2.4k views
ADD COMMENT
1
Entering edit mode
7.5 years ago

If you have the reference genome you can use BLAST2Go to annotate your genome against public data bases, then you can use this file to annotate your RNA-seq genes using a matching code in R

You can request to AgriGo database authors to load your reference genome and then you can load your data Here the databse

http://bioinfo.cau.edu.cn/agriGO/analysis.php

ADD COMMENT
1
Entering edit mode
7.5 years ago
328558608 ▴ 60

If your specie have been well researched by others, then you may find the formated annotation file provide by researchers

Else you need use blast or any other program/software to identify your gene sequence to an reference function known gene at first, then, according to similarity principle, you can say maybe your gene have the same annotation as the reference function known gene.

  1. online: KEGG koala/ghost(may misspelling, it just on KEGG official website)、KOBAS2.0 ; recommand: clusterProfiler as an R package

  2. I think yes

  3. you can use blast or any other tools can identity your sequence with the reference

  4. I would say yes

ADD COMMENT
0
Entering edit mode
ADD COMMENT
0
Entering edit mode
7.5 years ago
bioraihan • 0

Thanks everyone for your kind help and support.

Regards, Raihan

ADD COMMENT
0
Entering edit mode

If any of the above posts helped you solving the problem, please upvote to encourage the people. Otherwise if you found some alternate solution, you are welcome to share here (only if you are willing to share).

ADD REPLY
0
Entering edit mode

Yes, as EagleEyes said, I'm also very interested in the end of the question. If it is solved maybe I can follow the same route.

I'm also blocked in a similar issue. I have a transcriptome of a mollusk specie but I can't do the GO analysis. I didn't find a proper solution to my needs. I'm working with a non-model organism and I have annotated my transcriptome against a homemade database composed by 9 proteomes. My proteomes were downloaded from UniProt, so using the format converter of UniProt I have obtained the GO terms assigned to each entry of my transcriptome.

As in-put to a GO analysis I have an annotated transcriptome (protein based annotation) and the related GO terms, but the tools which I have found ask always for a related species (like GeneSCF). It is possible to do that in other way?

And just to clarify, if that is not possible, in order to follow the GeneSCF solution (or similar) I have "yes or yes" to "re-annotate" my transcriptome against the organism which I'll use as base to my GO analysis? for example, if I chose Danio rerio cause is well annotated and at least a marine organisms I have to blast my transcriptome against that species in order to share the same gene names?

Thank you for your time.

ADD REPLY
0
Entering edit mode

Hi pablo61991,

As Farbod mentioned in his query (here). You collect the genes from the closest species ('Danio rerio') and use those gene list in GeneSCF.

ADD REPLY
0
Entering edit mode

Thank you for your anwser EagleEye,

yes, I have read this topic but I was interested in an alternative to could use several species during my annotation instead of one. However, what I have found here (you and others) and in other forums/web resources make me think there isn't a good alternative to that approach that you highlight.

What do you think about use this approach to perform the GO analysis but then use the other annotation file to perform, for example, the DGE analysis? I think it wont be correct but as I have a short experience in this field I have to ask (sorry if it sounds like a stupid question).

Thank you again for your time.

ADD REPLY
0
Entering edit mode

It is bit unclear what you are trying to ask

'What do you think about use this approach to perform the GO analysis but then use the other annotation file to perform, for example, the DGE analysis?'

please elaborate?

ADD REPLY
0
Entering edit mode

Sorry, I'll try to explain myself better.

I don't know if it's possible to annotate my transcriptome vs Danio rerio database (mRNA or protein), then perform the GO analysis and finally, in order to obtain the list of genes which I need to perform the DGE analysis in DESeq2 (I follow the: kallisto>tximport >DESeq2 pipeline), generate other annotation based in a broader (homemade) database composed by more organisms.

I have this doubt because when I have annotated my transcriptome vs D. rerio I have obtained a 30% less of hits than when I did it vs my homemade database. So, in order to perform the DGE I'll lose information because some of the no-hits which I had vs D. rerio are in fact hits vs my homemade database.

ADD REPLY
0
Entering edit mode

I will suggest you to do it in following order,

  • Perform Differential expression analysis with your transcriptome data using the pipeline you prefer
  • Filter the top/significantly expressed genes/candidates
  • Only for those canditates search for similar genes in D. rerio and extract those genes
  • Use those obtained list from D. rerio in GeneSCF and perform enrichment/GO/pathway analysis

Now you will have,

  • All differentially expressed (DE) genes from your transciptome
  • Partial list of DE genes matches D. rerio which you will be using for enrichment/GO/pathway analysis.
ADD REPLY
0
Entering edit mode

Perfect, I'll apply this approach. As soon as I have done it successfully, I'll comeback to vote this answer.

Thank you.

ADD REPLY

Login before adding your answer.

Traffic: 1940 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6