Question

Bioinformatics definition of TAD boundary

0

Entering edit mode

3.6 years ago

sckinta ▴ 730

Can anyone provide a formal definition of TAD boundaries in bioinformatic language?

For example, I got a bed file from TAD caller (eg. example here is from hiTADs) using 10K bin. I subset the bed file by removing all the subTADs, and got output like below.

chr1    2120000 2340000
chr1    2340000 3790000
chr1    3790000 3840000
chr1    4020000 6050000
chr1    6050000 6750000

I noticed that the TADs are not always back-to-back connected. eg. there are 180Kbp gap between TAD chr1:3790000-3840000 and chr1:4020000-6050000, which correspond to the low interactive region in Hi-C matrix.

Question: should I consider the "gap" chr1:3840000-4020000 or the "internal bin" next to start/end coordinates or the "external bin" next to start/end coordinates as TAD boundary?

"internal bin" on above example. start ~ start + 10kbp or end - 10kbp ~ end

chr1    2120000 2130000
chr1    2330000 2340000
chr1    2340000 2350000
chr1    3780000 3790000
chr1    3790000 3800000
chr1    3830000 3840000
chr1    4020000 4030000
chr1    6040000 6050000
chr1    6050000 6060000
chr1    6740000 6750000

"external bin" on above example. start - 10kbp ~ start or end ~ end + 10kbp

chr1    2110000 2120000
chr1    2330000 2340000
chr1    2340000 2350000
chr1    3780000 3790000
chr1    3790000 3800000
chr1    3840000 3850000
chr1    4010000 4020000
chr1    6040000 6050000
chr1    6050000 6060000
chr1    6750000 6760000

hiTADs Hi-C TAD • 1.1k views

ADD COMMENT • link updated 3.5 years ago by kalavattam ▴ 280 • written 3.6 years ago by sckinta ▴ 730

0

Entering edit mode

Can anyone provide a formal definition of TAD boundaries in bioinformatic language?

There's not a consensus definition of TAD boundaries between the many programs that call them. The reasons why are many and can be complicated; this short review provides an excellent introduction: https://doi.org/10.1016/j.jmb.2019.09.026

You can get the gist of the different computational methods and corresponding different definitions from these papers too:

(There's not a consensus definition in biology either.)

Question: should I consider the "gap" chr1:3840000-4020000 or the "internal bin" next to start/end coordinates or the "external bin" next to start/end coordinates as TAD boundary?

I'm not entirely clear on your question. I am not sure if this helps you, but if you "pad" or add "slack" to one side of the TAD boundary, then it's common practice to pad or add slack to the other side too. For example, for the TAD with boundary start 3790000 and boundary end 3840000:

chr1 3780000 3800000
chr1 3830000 3850000

ADD REPLY • link 3.5 years ago by kalavattam ▴ 280