Hi, I am using Epigenome Roadmap data, and I can only find *.bed or *.wig files for mRNAseq and chipseq data (e.g. breast stem cell) for display on the genome browser. How I can get processed RPKM/FPKM expression level estimates and chipseq peak calls?
Of course, one could analyze the *.bed or*.wig files and estimate expression levels or peaks themselves, but I was wondering if processed data is already available.
WIG is probably expression level, and BED is probably peaks. Be sure to investigate what the EpigenomeRoadmap people have to say about those files.
edit: this BED looks like raw read names, and alignments. You could construct RPKM by carefully counting reads found at exon sites, or chipseq peaks at promoter regions. The WIG looks like read depth, so you can ignore that.
I am not sure if that is the case. By looking at the first few lines of the files, I think the bed files provide info about individual mapped reads, while wig file provides depth of coverage at a per-bp basis.
Be sure to investigate what the EpigenomeRoadmap people have to say about those files.
It looks like your BED file reports reads from a Solexa machine. Reads reported as BED don't carry sequence information, and you'll have to parse their reference sequences, or annotate to feature regions.
Check bedtools for intersectBed -c and it will count the hits along your genes BED.
The WIG just says there's some read at location 11041 through 11100.
You'll have to trust their mapping strategy, so find their reference sequence, or better the raw data of course.
I have now heard back from members of the Epigenetic Roadmap team about data availability.
Yes, right now you should only be able to access the .bed and .wig files which were submitted to the NCBI.
The peak calls and RPKM for the roadmap data is part of the submitted (currently under revision) analysis part of the consortium manuscript.
This data will be available after the manuscript has been published. There will be web resource links in the manuscript that links to the dataset.
And that addresses my question. Karl, yes, I could analyze the data myself the way you suggested, but given the response from the Epigenetic Roadmap team, I would just wait to work with their processed data.
WIG is probably expression level, and BED is probably peaks. Be sure to investigate what the EpigenomeRoadmap people have to say about those files.
edit: this BED looks like raw read names, and alignments. You could construct RPKM by carefully counting reads found at exon sites, or chipseq peaks at promoter regions. The WIG looks like read depth, so you can ignore that.
I am not sure if that is the case. By looking at the first few lines of the files, I think the bed files provide info about individual mapped reads, while wig file provides depth of coverage at a per-bp basis.