Uploading HiC track on UCSC browser
1
0
Entering edit mode
8 months ago
arsala521 ▴ 50

Hi,

I have a .hic file (not generated by myself. It is from NBI GEO: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE128800). I want to upload and visualize this data/file (GSM3685694_chimpanzee_panTro5_mapping.hic) on UCSC browser. I am trying to follow these steps mentioned on UCSC web (copied below):

The typical workflow for generating a hic custom track is this:

  1. Prepare your data by processing it with the Juicer pipeline to create a file in the hic format.
  2. Move the hic file (my.hic) to an http, https, or ftp location.
  3. Construct a custom track using a single track line. The basic version of the track line will look something like this: track type=hic name="My HIC" bigDataUrl=http://myorg.edu/mylab/my.hic
  4. Paste the custom track line into the text box in the custom track management page, click "submit" and view in the Genome Browser.

As I already have a .hic file uploaded on https location (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE128800), I assume I will start from step 3. I want to ask how should I modify bigDataUrl for my case. I tried (bigDataUrl=https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE128800/GSM3685694_chimpanzee_panTro5_mapping.hic) but it didn't work.

Second, on the 'add custom track' page, there are two text boxes (one is "Paste URLs or data:" and second is "Optional track documentation:" . For the step 4, which text box should I use to paste the custom track line.

Thank you

custom_track UCSC HiC • 913 views
ADD COMMENT
2
Entering edit mode
8 months ago

bigDataUrl=https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE128800/GSM3685694_chimpanzee_panTro5_mapping.hic

 wget -O - -q "https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE128800/GSM3685694_chimpanzee_panTro5_mapping.hic" | file -
/dev/stdin: HTML document, ASCII text, with very long lines (557)

this url links to a web page, not to the acual file.

I think you want : https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSM3685696&format=file&file=GSM3685696%5Fgorilla%5FgorGor4%5Fmapping%2Ehic

Furthermore, the server hosting the data must accept Range-Request data : https://en.wikipedia.org/wiki/Byte_serving . I don't think the NCBI can be used as a file provider (but may be I'm wrong).

EDIT:

and the answer is 'no'. If I only want the bytes 2-10 from your file, the range-request is ignored.

$ curl -r 2-10 -s  "https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSM3685696&format=file&file=GSM3685696%5Fgorilla%5FgorGor4%5Fmapping%2Ehic" | hexdump -C | head
00000000  48 49 43 00 08 00 00 00  e3 11 a0 66 00 00 00 00  |HIC........f....|
00000010  68 67 33 38 00 01 00 00  00 73 6f 66 74 77 61 72  |hg38.....softwar|
00000020  65 00 4a 75 69 63 65 72  20 54 6f 6f 6c 73 20 56  |e.Juicer Tools V|
00000030  65 72 73 69 6f 6e 20 31  2e 37 2e 36 00 1a 00 00  |ersion 1.7.6....|
00000040  00 41 4c 4c 00 9e 1f 2f  00 4d 00 b9 40 00 00 31  |.ALL.../.M..@..1|
00000050  00 06 c6 d6 0e 32 00 79  94 6f 0e 33 00 07 c0 d1  |.....2.y.o.3....|
00000060  0b 34 00 9b 71 56 0b 35  00 d3 0d d2 0a 36 00 db  |.4..qV.5.....6..|
00000070  4a 2e 0a 37 00 35 6d 7f  09 38 00 cc a3 a6 08 39  |J..7.5m..8.....9|
00000080  00 5d bc 3f 08 31 30 00  2e 96 f9 07 31 31 00 1e  |.].?.10.....11..|
00000090  42 0d 08 31 32 00 ad 9e  f1 07 31 33 00 a8 0f d1  |B..12.....13....|

so you need to find a server to host your data.

a range request that works (ucsc)

$ curl -r 2-10 -s  "http://hgdownload.soe.ucsc.edu/gbdb/hg19/bbi/hic/GSE63525_HMEC_combined.hic" | hexdump -C | head
00000000  43 00 08 00 00 00 59 81  3c                       |C.....Y.<|
00000009
ADD COMMENT
0
Entering edit mode

Thank you so much for a detailed response. I am new to uploading custom tracks (particularly hic tracks) on UCSC browser. What I understand from the answer is I can't use the NCBI web server and I need to upload this .hic file o github or some webserver. If you can please elaborate the 'range request' part a little more. Thank you again,

ADD REPLY
1
Entering edit mode

range request are not available on all servers. Why is it needed ? then UCSC will NOT download/store the whole file each time you want to display the data. It just extracts the data from the file where(chrom-start-end) the information is known to be found (this is how tools like tabix work with remote files )

ADD REPLY
1
Entering edit mode

I got the range request concept. Thank you so much. I read on a forum that google drive can be used for this purpose. Now I uploaded this hic file on google drive and used the track line as: track type=hic name="My HIC" bigDataUrl=[google drive link]. The browser is now giving the error that "Hi-C magic string is missing, does not appear to be a hic file". I am able to upload this file on Juicebox. I hope I am using the right link. After uploading this file on google drive, I right clicked the file and from the share drop down menu I clicked on copy link to get the link.

ADD REPLY
0
Entering edit mode

I have exactly the same problem at the moment. Got a "contact matrix" txt file with heatmap values, reformatted it into a .hic file, uploaded it to google drive and tried to use the link to the file as URL for the track. Got the "Hi-C magic string is missing, does not appear to be a hic file" Error. I am not sure if the link is not working, or if it is some issue with the file format, but I can open the file locally with Juicebox just fine.

ADD REPLY
1
Entering edit mode

Google Drive does not allow byte range requests. This is documented by UCSC: https://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html#Hosting

You will need to find a hosting provider that allows normal file access, e.g. your University, any normal webserver, or figshare, github, etc.

ADD REPLY

Login before adding your answer.

Traffic: 1891 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6