Entering edit mode
7.1 years ago
elisheva
▴
120
Hi everyone!
Does anybody knows where can I find c.elegans WS220 genes?
I tried to look at wormBase website.
But when I'm trying the data freezes option - WS220 - I get WS230.
If there is other ways to get this data (BED file or gtf file) it will be very helpful.
Thanks!!!
I tried it already.
but there is no genes information there.
It's only genomic data or transcripts.
Or gff files - which don't have chromosome and it's 1-base offset.
These are the first two lines of the .gff3 file for c.elegans:
As you can see, you have chromosome information of all annotated genomic features here. You might need to filter the file to retrieve only protein coding genes information if it is what you are after. If you need this file in bed format, conversion is possible using BEDOPS.
Hope this helps,
Carlo
Thanks a lot!! And just to make sure (I am not familiar with gff format):
if I'm interested only the genes, Should I select only the rows that at the 9 column the value is: ID= Gene ? Because if it does I have a problem: I have list of the genes names that contains 45,461 genes.
And the number of attributes of ID=Gene is only 43,164.
So something doesn't work here....
You should filter based on 3d column (the "type" of genomic feature). Quick summary by doing :
This tells us that there are 43280 annotated gene. This contrasts with your list of 45,461 genes... this is (sadly) very typical of different gene annotations... So where does that list come from ? Does it include pseudogenes ? Is it from the same WS220 release ?
When I tried this command:
I got 48549.
May I have a mistake?
Not really, with the grep "gene", you get both the 43280 genes and 5269 pseudogenes that add up to 48549 :)