match bin number with the coordinate for each chromosome
2
0
Entering edit mode
2.6 years ago
mthm ▴ 50

I couldn't really find a question similar to mine, or at least I didn't know what exactly to search for.

I have binned my chromosomes with a 1kb window,

chr1    0    1000
chr1    1000    2000
chr1    2000    3000
.
.
chr1    35000 36000
.
.
.
chrx 0    1000
chrx    1000    2000
.
chrx    20000    21000

after doing some statistical analyses, I have a list of bin numbers say, bin number 2364 till bin number 4576 and now I want to extract the coordinates for these bins, if I use awk 'NR==2364 { print $0 }' file.bed it is counting from line 1 till the end irrespective of the chromosomes, but I need a way to start counting from 1 for each chromosome name, how should I do that?

chromosome bins coordinates • 822 views
ADD COMMENT
1
Entering edit mode
2.6 years ago
mthm ▴ 50

so the simplest answer would be

awk '$1=="chr1"' file.bed | awk 'NR==2364'

this way we can pick the desired bin for the desired chromosome separately, starting from 1 for each chromosome.

ADD COMMENT
0
Entering edit mode
2.6 years ago

not tested:

 awk -F '\t' 'BEGIN{P="";N=0;} {if(P!=$1) {N=0;} N++;P=$1;if(N==2364) print;}'  in.bed
ADD COMMENT
0
Entering edit mode

thanks, that works, but that picks up the same bin number for each chr, since I have different numbers for different chromosomes I came up with another solution

ADD REPLY

Login before adding your answer.

Traffic: 1990 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6