Dear all,
could you help me please merge rows by coordinates in column $2. There are series of coordinates growing by one. I wan to output f.e. : 1st row merge to 4th row 9079811-9079814 and after that there is no series so merge it to another row etc.. for 3rd column in input I would like to count average.
I wrote some script, but this script merge all rows from first coordinate to last coordinate. no condition to series.
awk -F'\t' -v OFS="\t" '{print $2,$4,$3,$1}' input | awk '!x[$2]{x[$2]=$1}y[$2]<$1{y[$2]=$1;}x[$2]>$1{x[$2]=$1} {sum+=$3} END{for(i in y)print $1,x[i],y[i],sum/NR,i}' | sort -V -k1,1 > output
INPUT:
chr12 9079811 29 A2M
chr12 9079812 29 A2M
chr12 9079813 29 A2M
chr12 9079814 28 A2M
chr12 9091202 5 A2M
chr12 9091203 5 A2M
chr12 9091204 5 A2M
chr12 9091390 15 A2M
chr12 9091391 15 A2M
chr12 9091392 13 A2M
OUTPUT:
chr12 9079811 9079814 28.75 A2M
chr12 9091202 9091204 5 A2M
chr12 9091390 9091392 14.3 A2M
Why do you _need_ to use
awk
? Python is better suited, as your programming will be more readable.I think it won't be a problem to do it in python, but..I am beginner in this language :-(
All the more reason to use it - how else would you learn?