Question

Suggestions For Extracting Data (A Challenge Of Sorts !)

0

Entering edit mode

13.4 years ago

Atom Smasher ▴ 20

Hello all,

I have a problem which looks very complicated to me, but I am sure with suggestions from you folks, I'll go in the right direction.

I am still new to Bioinformatics data analysis and I would appreciate it if I could get ideas from you.

Ok... so, here we begin.

I wish to visualize a group's "peak-called" ChIP-Seq data in IGB(Integrated Genome Browser) for comparison with my research group's data. Specifically, I need to upload a particular chromosome's "bar" file into IGB for visualization. However, the peak analysis on the chromosome should have already been done.

The "bar" files that I am talking about are generated by a program called USEQ while calling peaks. However, the problem is that through Gene Expression Omnibus (GEO), I have access to the other group's following data :

1) Their raw data file 2) Their "eland_results.txt" file 3) Their "eland_export.txt" file 4) Their final peaks (bed) file

This other group does not use input files for calling peaks. So they do not have any input data. If they had input data available, I could have easily "re-processed" their raw file using their input data file and a peak calling program like USEQ. And then I would have easily got their "peak-called" chromosome "bar" files.

Also, I cannot simply convert their "final peak" files (existing in the "bed" format) to "bar" files as this would only give me the chromosome's "bar" files with only "peaks" in it. I wish to visualize the whole chromosome with "regions having peaks" and "regions not having peaks"

I hope I am not being very ambiguous. But how should I go about solving this problem ?

Thank you.

bed peak-calling • 2.6k views

ADD COMMENT • link updated 13.4 years ago by Istvan Albert 103k • written 13.4 years ago by Atom Smasher ▴ 20

0

Entering edit mode

I'm not clear why the BED files would not get you most of the way there. These represent the peaks called by the authors, do they not? Could you explain why that would not allow you to 'visualize the whole chromosome with "regions having peaks" and "regions not having peaks"'?

ADD REPLY • link 13.4 years ago by Sean Davis 27k

score 0 · Answer 1 · 2012-03-24

What you are most likely looking for (although it is not clear from your post) is the per base coverage over the entire genome. This coverage was used to call peaks but the peak data does not contain the actual shape of the peak.

Your best bet would be to transform your eland output to SAM format. Most genome browsers can generate the coverage from the SAM file.