Entering edit mode
2.4 years ago
Raju
•
0
Hello Folks, I have two DataFiles
T-Gene.csv
Chr,Start,End,ID
6,38517417,38517437,kgp17152035
6,38517556,38517576,rs4254983
6,38517997,38518017,kgp10250023
6,38519465,38519485,kgp17245206
6,38519751,38519771,kgp8446980
6,38519946,38519966,kgp17382319
6,38520249,38520269,kgp17414348
6,38521434,38521454,kgp2796714
6,38522620,38522640,rs17614684
T-Gene_Links.csv
Gene_ID,Gene_Name,TSS_ID,TSS_Locus,Strand,Max_Expr,RE_Locus,Max_Hist,Distance,Closest_Locus,Closest_TSS,Histone,Correlation,Correlation_P_Value,Distance_P_Value,CnD_P_Value,Q_Value
ENSG00000112164.5,GLP1R,ENST00000373256.4,chr6:39016574-39016574,+,3.06,chr6:39041458-39041477,0.0,24884.0,F,T,H3k27ac,0.43676529,0.04292,0.04979,0.01528,0.9006
ENSG00000112164.5,GLP1R,ENST00000373256.4,chr6:39016574-39016574,+,3.06,chr6:39053087-39053106,0.0,36513.0,F,F,H3k27ac,0.40457163,0.05091,0.07304,0.02452,0.9006
ENSG00000112164.5,GLP1R,ENST00000373256.4,chr6:39016574-39016574,+,3.06,chr6:39049954-39049973,0.0,33380.0,F,F,H3k27ac,0.40299158,0.05134,0.06678,0.02289,0.9006
ENSG00000112164.5,GLP1R,ENST00000373256.4,chr6:39016574-39016574,+,3.06,chr6:39041701-39041720,0.0,25127.0,F,T,H3k27ac,0.39967616,0.05211,0.05027,0.01819,0.9006
ENSG00000112164.5,GLP1R,ENST00000373256.4,chr6:39016574-39016574,+,3.06,chr6:39047953-39047972,0.0,31379.0,F,F,H3k27ac,0.39680426,0.05283,0.06278,0.02225,0.9006
ENSG00000112164.5,GLP1R,ENST00000373256.4,chr6:39016574-39016574,+,3.06,chr6:39049196-39049215,0.0,32622.0,F,F,H3k27ac,0.39410562,0.05358,0.06526,0.02328,0.9006
ENSG00000112164.5,GLP1R,ENST00000373256.4,chr6:39016574-39016574,+,3.06,chr6:39043787-39043806,0.0,27213.0,F,T,H3k27ac,0.38921109,0.055,0.05444,0.0204,0.9006
ENSG00000112164.5,GLP1R,ENST00000373256.4,chr6:39016574-39016574,+,3.06,chr6:39048210-39048229,0.0,31636.0,F,F,H3k27ac,0.38908754,0.05503,0.06329,0.0232,0.9006
ENSG00000112164.5,GLP1R,ENST00000373256.4,chr6:39016574-39016574,+,3.06,chr6:39040699-39040718,0.0,24125.0,F,T,H3k27ac,0.38840919,0.05519,0.04827,0.01846,0.9006
Here I need to take the RE_Locus
column range in T-Gene_Links.csv
and compare it with the Start & End
position in T-Gene.csv
and whenever it finds a suitable range it has to extract SNP ID
convert both files to BED with
awk
, sort, and usebedtools intersect