Hello,
I have gene expression from RNA-seq, and want to separate genes into categories based to which TAD they belong. Say, I have coordinates of genes together with expression in one file and coordinates of TADs in another file, and I want to intersect these two files and add in the resulting new file with genes a new column with the number of the TAD to which a given gene belongs.
And the next step is to compare gene expression inside and outside each TAD.
Is there already a shared solution to do this?
Thanks!
Have you tried bedtools intersect?
Yes, I actually ended up sorting both files and then applying intersectBed with option -wo. Which is equivalent to what you proposed. This, however, does not mark TADs by numbers (1,2,3, etc). So any downstream analysis requires an additional step reading the TAD coordinates and comparing them. Which means, I am afraid, that there is no ready solution to compare gene expression inside and outside each TAD? Has to be written manually?
Can you show how your TADs are saved? The intersect command will print per each gene the TAD it overlaps including the ID of the TAD.
I assumed that each or your TADs had an ID. You can add a number to each TAD as follows:
Notice that I assume that you already have the TADs as a .bed file in which the 4th column corresponds to the ID.
Are you familiar with any particular programming language such as R?
The question is whether a solution already exists to not repeat it. The task seems to be quite common.
Any language would be fine. Perl, etc
GenomicRanges in Bioconductor supports this type of operation in all its simplicity or complexity (you would roll your own solution).