c.elegans W220 genes
1
0
Entering edit mode
7.1 years ago
elisheva ▴ 120

Hi everyone! Does anybody knows where can I find c.elegans WS220 genes?
I tried to look at wormBase website.
But when I'm trying the data freezes option - WS220 - I get WS230.
If there is other ways to get this data (BED file or gtf file) it will be very helpful.
Thanks!!!

c.elegans database • 2.5k views
ADD COMMENT
0
Entering edit mode
7.1 years ago

Try the ftp site directly : ftp://ftp.wormbase.org/pub/wormbase/releases/WS220/

ADD COMMENT
0
Entering edit mode

I tried it already.
but there is no genes information there.
It's only genomic data or transcripts.
Or gff files - which don't have chromosome and it's 1-base offset.

ADD REPLY
1
Entering edit mode

These are the first two lines of the .gff3 file for c.elegans:

CHROMOSOME_II   Transposon_CDS  transposable_element    2308405 2309571 .   +   .   cds=B0281.2
CHROMOSOME_X    Transposon_CDS  transposable_element    17169322    17170161    .   +   .   cds=B0302.3

As you can see, you have chromosome information of all annotated genomic features here. You might need to filter the file to retrieve only protein coding genes information if it is what you are after. If you need this file in bed format, conversion is possible using BEDOPS.

Hope this helps,

Carlo

ADD REPLY
0
Entering edit mode

Thanks a lot!! And just to make sure (I am not familiar with gff format):
if I'm interested only the genes, Should I select only the rows that at the 9 column the value is: ID= Gene ? Because if it does I have a problem: I have list of the genes names that contains 45,461 genes.
And the number of attributes of ID=Gene is only 43,164.
So something doesn't work here....

ADD REPLY
0
Entering edit mode

You should filter based on 3d column (the "type" of genomic feature). Quick summary by doing :

cut -f 3 Downloads/c_elegans.WS220.annotations.gff3 | sort | uniq -c

This tells us that there are 43280 annotated gene. This contrasts with your list of 45,461 genes... this is (sadly) very typical of different gene annotations... So where does that list come from ? Does it include pseudogenes ? Is it from the same WS220 release ?

ADD REPLY
0
Entering edit mode

When I tried this command:

sort -u c_elegans.WS220.annotations.gff3 | cut -f 3 | grep "gene" | wc -l

I got 48549.
May I have a mistake?

ADD REPLY
1
Entering edit mode

Not really, with the grep "gene", you get both the 43280 genes and 5269 pseudogenes that add up to 48549 :)

ADD REPLY

Login before adding your answer.

Traffic: 1565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6