Intersect genomic locations with genes
1
0
Entering edit mode
8 months ago

I have a bed file containing genomic locations and A or B compartment annotations, how do i find the genes in these location from this bed file. I was doing a hic data analysis

This is my domain file generated from compartment analysis of hic data, if anybody has worked with hic data or knows hic data analysis ,it would be really helpful

enter image description here

fanc Chromatin conformation Hic hic • 858 views
ADD COMMENT
0
Entering edit mode

Please do not paste screenshots of plain text content, it is counterproductive. You can copy paste the content directly here (using the code formatting option shown below), or use a GitHub Gist if the content volume exceeds allowed length here.

ADD REPLY
0
Entering edit mode

it exceeds the total words allowed limit, but this is the basic format, it contains the genomic locations of each chromosome followed by if it falls into A or B compartments followed by an ensulation score.

ADD REPLY
0
Entering edit mode

it exceeds the total words allowed limit,

yes but you don't have to paste the whole file. The first 10 rows would have been ok.

ADD REPLY
0
Entering edit mode

my bad, i misunderstood, here is the file format

chr1    1   187000000   A   0.05849574892674321 .
chr1    187000001   188000000   B   -0.0012336259664338584  .
chr1    188000001   195000000   A   0.019978724419553655    .
chr1    195000001   196000000   B   -0.01119525268169755    .
chr1    196000001   248956422   A   0.061414677382861445    .
chr2    1   1000000 A   0.010233732397155913    .
chr2    1000001 2000000 B   -0.004427407974132213   .
chr2    2000001 3000000 A   0.03406108282647333 .
chr2    3000001 5000000 B   -0.024879084064948365   .
chr2    5000001 8000000 A   0.031004999249523574    .
ADD REPLY
0
Entering edit mode
8 months ago

get a bed file with the genes eg:

gunzip -c  in.gtf.gz |\
awk -F '\t' '($3=="gene") {G="."; N=split($9,a,/[; "]*/); for(i=1;i+1<=N;i++){if(a[i]=="gene_name") {G=a[i+1];break;}} printf("%s\t%d\t%d\t%s\n",$1,int($4)-1,$5,G);}'  |\
sort -t $'\t' -k1,1 -k2,2n > genes.bed

and then use bedtools intersect

ADD COMMENT
0
Entering edit mode

the feature file is not .gz and upon just leaving the gunzip command and follwing didnt help either, it only wrote an empty genes.bed file

ADD REPLY
0
Entering edit mode

the feature file is not .gz

this is obviously just an example, I'm not supposed to know your environment.

so don't use gunzip !

ADD REPLY
0
Entering edit mode

yes but the genes.bed file is empty, sorry im just a masters student working on my masters thesis and this is my first time performing genomics analysis

ADD REPLY
0
Entering edit mode

Related post on how to get genes.bed:

ADD REPLY

Login before adding your answer.

Traffic: 1615 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6