What is meaning of true diagonal in Hi-C?
3
1
Entering edit mode
8.3 years ago

Hi everyone, In Hi-C contact matrix, i see non zero values in diagonal. What is the meaning of self contacting. Should i ignore the diagnoal for further calculation.

Thanks

Hi-C Chia-pet 3C 4C 5C • 4.8k views
ADD COMMENT
5
Entering edit mode
8.2 years ago

The true diagonal contains contacts between loci separated by distances below the size of a bin. The 2nd diagonal has such contacts too, i.e. contacts between pairs of loci located close to the bin boundary, but on the either side of it.

Our lab's usual recommendation is to discard two first diagonals, b/c they are contaminated by non-informative artifacts of the Hi-C procedure: unligated and self-ligated molecules. The former are just pieces of undigested and unligated DNA; the latter are formed when two ends of the same molecule get ligated and then the formed circle gets cleaved elsewhere. Both types of molecules looks like short distance contacts. Unligated DNA pieces look like two contacts with a separation of a few hundred bp and with the sequencing directions pointing toward each other along the genome. Self-ligated molecules usually have a separation of a few kb to 10 kb and have sequencing directions pointing away from each other. Both do not contain information about spatial organization. B/c unligated DNA and self-circles cannot be distinguished from "true" contacts formed by two distinct ligated molecules, their presence essentially invalidates all statistics on short-distance contacts. For this reason, we usually discard the first two diagonals of Hi-C matrices at high resolutions (up to a few tens of kb) or only the first diagonal for low resolution datasets (100kb+).

ADD COMMENT
0
Entering edit mode

Is this recommendation on diagonal bias (i.e., data processing workflow) published?

ADD REPLY
1
Entering edit mode

uuuggghh, not really! :) The 4DN consortium is currently working on Hi-C data analyses standards, which will address this issue, among others. Though, it will take some time to produce a document. Meanwhile, I'd recommend my all-time favorite guideline to Hi-C data analysis by Noam Kaplan and Bryan Lajoie from Dekker's group. It discusses the non-informative artifacts of Hi-C, though doesn't suggest any particular threshold.

ADD REPLY
0
Entering edit mode

Thank you for your ideas!
I remembered your suggestion to remove #1 & #2 diagonal, which sounds very logical to me. But still, shouldn't we, in ideal world, remove diagonal bias while normalising interaction matrices (i.e., O / E, should remove super and sub-diagonal bias). This is not a question, just random observation that should be tested :)

ADD REPLY
0
Entering edit mode
8.3 years ago
Nibua ▴ 70

What's the size of the bins in you matrix? Let's say the bins are 5kb long. The values in the diagonal means that you see interactions at very short distance (less than 5kb). To me these values should be the highest values.

ADD COMMENT
0
Entering edit mode

Thanks for the answer. My bin size is 40 kb. You are correct that diagonal should be the highest value. But for finding the true contact pairs should i use diagonal value as it is or should i put 0 in diagonal. I have a confusion about that.

ADD REPLY
0
Entering edit mode

What is the method you will use to find the best contact pairs?

ADD REPLY
0
Entering edit mode

I am calculating probability of contacting pair using zhang and wolynes 2015 PNAS method p(i,j) =min(1,C(i,j)/min(ni,nj)) where ni = max[ C(i-4, i), C(i-3,i), ..., C(i,i+2), C(i,i+3) ] C(i,j) is the contact frequency of pairs. So, i have a doubt that what value of C(i,i) should i used for calculating probability p(i,j).

ADD REPLY
0
Entering edit mode
8.3 years ago

So you are estimating contacts between fragments. But obviously, fragments which are already very close together (i.e. in the same bin) are highly likely in close proximity with each other. And that's not exactly what you are looking for, probably.

You probably want to check for secondary structure interactions such as looping, not for interactions because of the primary sequence - proximity.

ADD COMMENT

Login before adding your answer.

Traffic: 2538 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6