Regions Of Cpg Island In Hg18 And Hg19 In Gff Format
2
1
Entering edit mode
10.6 years ago
ChIP ▴ 600

Hi!

Does somebody knows or has a one liner script or something that can get me all the CpG islands and their information like name, length, cpgNum, gcNum, perCpg, perGc and obsExp.

Thank you

genomics perl python • 4.4k views
ADD COMMENT
4
Entering edit mode
10.6 years ago

What about this? It's not GFF format but it should be pretty easy to reformatted as such.

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \
    "select * from hg19.cpgIslandExt" > hg19.cpgIslandExt.txt

Replace hg19 with hg18 to get the 18 version.

Sample output:

bin     chrom   chromStart      chromEnd        name    length  cpgNum  gcNum   perCpg  perGc   obsExp
585     chr1    28735   29810   CpG: 116        1075    116     787     21.6    73.2    0.83
586     chr1    135124  135563  CpG: 30 439     30      295     13.7    67.2    0.64
587     chr1    327790  328229  CpG: 29 439     29      295     13.2    67.2    0.62
588     chr1    437151  438164  CpG: 84 1013    84      734     16.6    72.5    0.64
588     chr1    449273  450544  CpG: 99 1271    99      777     15.6    61.1    0.84
...

EDIT: Also convert to GFF using home made awk script:

awk 'BEGIN{FS="\t"; OFS="\t"} NR>1 {print \
    $2, \
    "cpgIslandExt", \
    "CpGi", \
    $3, \
    $4, \
    "0", \
    ".", \
    ".", \
    "name \""$5"\"; " \
    "length "$6"; " \
    "cpgNum "$7"; " \
    "gcNum "$8"; " \
    "perCpg "$9"; " \
    "perGc "$10"; " \
    "obsExp "$11"; " \
    }' hg19.cpgIslandExt.txt | head

Sample output:

chr1    cpgIslandExt    CpGi    28735    29810    0    .    .    name "CpG: 116"; length 1075; cpgNum 116; gcNum 787; perCpg 21.6; perGc 73.2; obsExp 0.83; 
chr1    cpgIslandExt    CpGi    135124    135563    0    .    .    name "CpG: 30"; length 439; cpgNum 30; gcNum 295; perCpg 13.7; perGc 67.2; obsExp 0.64; 
chr1    cpgIslandExt    CpGi    327790    328229    0    .    .    name "CpG: 29"; length 439; cpgNum 29; gcNum 295; perCpg 13.2; perGc 67.2; obsExp 0.62; 
chr1    cpgIslandExt    CpGi    437151    438164    0    .    .    name "CpG: 84"; length 1013; cpgNum 84; gcNum 734; perCpg 16.6; perGc 72.5; obsExp 0.64; 
chr1    cpgIslandExt    CpGi    449273    450544    0    .    .    name "CpG: 99"; length 1271; cpgNum 99; gcNum 777; perCpg 15.6; perGc 61.1; obsExp 0.84;
ADD COMMENT
0
Entering edit mode

as soon as I convert this to GFF format I loose information. You can also try and see it, I am using ucsc_table2gff3.pl program from biotoolsbox (https://code.google.com/p/biotoolbox/).

ADD REPLY
0
Entering edit mode

See edited answer to convert to GFF, I haven't checked it very carefully but I think something on these lines should do. (I'm not familiar with the tool you linked)

ADD REPLY
3
Entering edit mode
10.6 years ago

http://genome.ucsc.edu/cgi-bin/hgTables?command=start -> All tables -> "CpgIslandExt" , select GTF format.

ADD COMMENT

Login before adding your answer.

Traffic: 1009 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6