command to make a coordinate file from the header
2
0
Entering edit mode
3.0 years ago
harry ▴ 40

I have one file which looks like below: As you can see there is a name after and before pipe(|). From that Isolate coordinate and make a file of the chromosome, start and end coordinates.

exon2_ENST00000559151_15-44409519-44409847|>exon2_ENST00000559151_15-44409519-44409847                                  
exon8_ENST00000264596_4-177353308-177353728|>exon8_ENST00000264596_4-177353308-177353728                                    
exon4_ENST00000261220_12-95494056-95494217|exon6_ENST00000261220_12-95496004-95496098                                   
exon6_ENST00000438023_9-6880012-6880061|exon8_ENST00000438023_9-6893095-6893232                                 
exon5_ENST00000219481_16-410243-410367|exon7_ENST00000219481_16-410972-411076                                   
exon6_ENST00000244230_2-71139795-71139862|exon8_ENST00000244230_2-71144428-71144538                                 
exon2_ENST00000316218_3-123089168-123089294|exon3_ENST00000316218_3-123092355-123092442                                 
exon2_ENST00000309794_17-82459992-82460072|exon5_ENST00000309794_17-82472564-82472698                                   
exon2_ENST00000462685_2-73932598-73932683|exon5_ENST00000462685_2-73958146-73958245 

As in the 3rd row, you can see these coordinates are before (|) 12-95494056-95494217 and 12-95496004-95496098 these are after (|) so I want to make a 3 column file in which 1st column is 12 and the second column is the lowest number from the before (|) and in 3rd column is the highest number from the after (|). like 12 95494056 95496098. Likewise, it does for all the names and makes the chromosome, start and end coordinates column as below.

15  44409519    44409847
4   177353308   177353728
12  95494056    95496098
9   6880012 6893232
16  410243  411076
2   71139795    71144538
3   123089168   123092442
17  82459992    82472698
2   73932598    73958245

Is it possible to do so by any command, I did search it but can't find anything which can do like this. Thanks in advance

chromosome coordinate • 966 views
ADD COMMENT
3
Entering edit mode
3.0 years ago
awk -F "[_|-]+"  '{print $3,(int($4)<int($9)?$4:$9),(int($5)>int($10)?$5:$10)}' < input.txt
ADD COMMENT
2
Entering edit mode
3.0 years ago
$ awk -F '_|-|\|' -v OFS="\t" '{print $3,$4,$5;print $8,$9,$10}' test.txt  | datamash -g 1 min 2 max 3 

15  44409519    44409847
4   177353308   177353728
12  95494056    95496098
9   6880012 6893232
16  410243  411076
2   71139795    71144538
3   123089168   123092442
17  82459992    82472698
2   73932598    73958245
ADD COMMENT

Login before adding your answer.

Traffic: 1966 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6