How To Get The List Of Genes Involved In A Biological Process
4
4
Entering edit mode
13.7 years ago
Dejian ★ 1.3k

In the silkworm genome paper, it is mentioned that "Of the 323 wing-development genes known in D. melanogaster, 300 are found in silkworm." The author did not give any reference to how they got the 323 wing-development genes. My question is how to get the list of genes involved in a biological process. Take fruitfly for example. I tried to get the list of wing-development genes from flybase using QueryBuilder. However, I only got 33 genes, far less than 323. I stored my query and the query parameter is as follows:

target=fbgn

output=HitList

format=-

species=Dmel

guistyle=1

AND go go-id CV term FBbt:00004729 no

Did I make some mistake or are there some other ways to do this? Any advice is welcome.

gene retrieval • 5.4k views
ADD COMMENT
0
Entering edit mode

In my experience, the FlyBase query builder often does not work or produces results that are difficult to interpret. If you end up using GO based queries, I suggest using the BioMart or FlyMine systems instead for more robust and reproducible results.

ADD REPLY
0
Entering edit mode

Casey, unfortunately, BioMart does not deal with the tree-structure of the gene ontology, even though it would be possible to query for the cumulative list of identifiers in my answer below. FlyMine does deal with the tree-structure, but I looked up 'wing disc development' and got 277 distinct gene identifiers as result. I find that suspect, because that number is less what Simon, Pierre and I have found using independent methods. Thanks for the FlyMine pointer though -- I keep forgetting it exists. :D

ADD REPLY
10
Entering edit mode
13.7 years ago

My first thought here would be to search the Gene Ontology. Sure enough, a search for 'wing development' returns us the slightly more specific term 'wing disc development' which is used to annotate 366 genes. If we filter this list to include only D. melanogaster, we reduce that number to 347, which is well within the ball park of 323 (allowing for increasing annotation since publication of the silkworm paper you reference).

ADD COMMENT
1
Entering edit mode

Another advantage of AmiGO is that it provides PubMed links to original literatures. Pretty useful!

ADD REPLY
0
Entering edit mode

This solution provides a clear view of the numbers of genes in each GO items. Howerver, it is not convenient to download the related sequences because the download link does not offer sequence-downloading. Many thanks, anyway.

ADD REPLY
0
Entering edit mode

@Balance, to support Simon's answer, you didn't say that you also wanted to download the sequences.

ADD REPLY
0
Entering edit mode

@Balance - you can actually, click 'View all results' (at the top of the results, under the filters box), then skip to the bottom of the page, click 'Select all', then you can choose to download the selected sequences in FASTA format from the drop down menu. Not straightforward I realise, but certainly possible.

ADD REPLY
0
Entering edit mode

Yes, that's my fault. I appreciate Simon's help. He showed me how to use AmiGO. :)

ADD REPLY
0
Entering edit mode

That's great,Simon! Perfect!

ADD REPLY
6
Entering edit mode
13.7 years ago

I guess they used a controlled vocabulary (e.g. gene ontology) to find a list of genes annotated with a defined term. For example, if you search for GO:0035220

"GO:0035220 = wing disc development"

Progression of the wing disc over time, from its initial formation through to its metamorphosis to form adult structures including the wing hinge, wing blade and pleura.

http://flybase.org/cgi-bin/cvreport.html?rel=is_a&id=GO:0035220

you'll get 352 genes.

ADD COMMENT
0
Entering edit mode

You are right. It seems that I chose the wrong CV. But I want to get all gene related to wing. I am considering to combine these results.

ADD REPLY
0
Entering edit mode

Dear Pierre

To do a categorized GO enrichment analysis I have a list of GO ID(around 300 GO ID) and I would like to know the genes which linked for each GO ID based on GO release 2013-03-16.

For example related to GO:0032392 it should be 26 genes in genome(GO release 2013-03-16, removed IEA, ND and RCA). I would like to know these 26 genes with this situation.

How can I find this information? Actually, I am not bioinformatician, Could you please show me the easy way if possible?

Thanks in advance.

Sincerely,
Omid

ADD REPLY
0
Entering edit mode
13.7 years ago

If I assume that "AND go go-id CV term FBbt:00004729 no" looks up GO term 00004729 you probably simply selected the wrong GO term. If you select 0035220 (as mentioned by Pierre, and also the term Simon used), your query might work.

ADD COMMENT
0
Entering edit mode

FBbt:00004729 and GO:00004729 belong to different CV Hierarchy. FBbt is definded according to Fly Anatomy. I want to get all gene related to wing. I am considering to combine these results. Thanks:)

ADD REPLY
0
Entering edit mode

I see. Thanks for explaining. The "go go-id" part of your query made me think it might be simpler than it really is.

ADD REPLY
0
Entering edit mode
13.7 years ago
Joachim ★ 2.9k

The paper is from 2004 and new discoveries may have been made since then, or it might simply be the case that the underlying data annotations have changed (being amended, corrected, expanded, ...). Regarding FlyBase, you can have a look at previous versions of it, ftp://ftp.flybase.net/releases/, which go back to 2006. The data is either provided in form of XML, PostgreSQL dumps or TSVs.

I downloaded the following two files, one from 2006 and one from the current 2011/03 release, and then compared the number of genes that are linked to GO:0035220 and all its children.

The following grep counts the genes that are associated with GO:0035220 and its children (GO-IDs taken from the tree as it appears in the current 2011/03 release of FlyBase):

grep -E 'GO:(0035220|0035294|0007477|0048802|0035309|0035310|0035311|0007472|0007476|0090254|0090255|0090253|0090252|0048526|0035317|0008587|0008586|0007474|0035222|0048100|0048190|0007473|0001737|0035320|0035319|0035318|0035321|0007475)' gene_association.fb | cut -f 2 | sort | uniq | wc -l

Output:

  • 185 for gene_association of 2006/01
  • 348 for gene_association of 2011/03 (Why do I see one more entry than Simon!? Why-oh-why?)

My point is: unless you operate on the original data of the authors, you will pretty much always come to a different result. You could bulk-download the gene_association files from FlyBase and then plot the output of the command line above over time to see

  • if there is a sudden jump that shows that the GO annotation was amended/extended
  • if there is a slowly increasing slope that shows that new discoveries were made and then added bit-by-bit as they occurred
ADD COMMENT
0
Entering edit mode

Thank you for taking the trouble to examine the previous releases. I wasn't trying to reproduce the result. However, your reply reminds me to clarify the release number if FlyBase is used.

ADD REPLY
0
Entering edit mode

You are welcome. I think it is great that FlyBase keeps an archive, so that it is possible to back-track these issues over time.

ADD REPLY

Login before adding your answer.

Traffic: 1891 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6