I've got two sorted tab-delimited files:
input.txt includes chromosome, location, gene name, strandedness
input.txt
10 282035 282125 RNA1 -
10 4134522 4134564 RNA1 -
10 5299783 5299910 RNA2 -
10 5900317 5900359 RNA1 -
ref2.txt includes read count, chromosome, location
ref.txt
1 9 137792944
1 9 137792945
1 10 282074
4 10 282095
4 10 5900329
I want to print a sum on values IF certain criteria is met.
Namely:
IF ref$2==input$1
AND
ref$3 falls within a range of min==input$2 && max==input$3
Print input$0 and sum of ref$1 (as input$6) else print zero (as input$6) So the result should look like that:
10 282035 282125 RNA1 - 5
10 4134522 4134564 RNA1 - 0
10 5299783 5299910 RNA2 - 0
10 5900317 5900359 RNA1 - 4
This is what I came up with:
awk '
NR == FNR {min[NR]=$2; max[NR]=$3; chr[NR]=$1; next}
{
for (id in min)
if (($2==chr[NR])&&(min[id] < $3 && $3 < max[id])) {
print $0, sum+=$1
break
}
}
' input.txt ref.txt > output.txt
There's clearly something wrong here, since I don't get any output. Also, I'm still missing "else print zero".
Can somebody help me please?
Thanks for the reply! I don't seem to get it to work though (no output). I think
if($7>=$2 && $7<=$3)
should rather beif($8>=$2 && $8<=$3)
but even that does not solve the issue...