You need to prefix the chr variable with the string literal chr, as right now you are asking awk to match the first field on numbers (1 through 22) and not chromosome names (chr1 through chr22).
$ for chr in {1..22}; do awk -v chr="chr${chr}" ...
But BEDOPS offers a much faster and simpler way of splitting BED files with bedextract, which uses a binary search approach on sorted BED files to jump to the start of a chromosome:
$ fn=intervals.bed
$ for chr in `bedextract --list-chr ${fn}`; do bedextract ${chr} ${fn} > ${fn}.${chr}.bed; done
With awk or grep, you need to re-read through the file linearly on every chromosome, which wastes time, especially on very large, whole-genome scale BED files. Use BEDOPS if your time is valuable to you.
Note that grep chr1 input.bed will return records for chr1, chr10, chr11, etc. and grep chr2 input.bed will return records for chr2, chr20, and chr21, so the output for those search keys will contain more intervals than probably expected and is likely incorrect.
This perfectly working thanks but if there are multiple files then how can parallel be used where each file need to be labeled with original file name and the chromosome name.
Note that
grep chr1 input.bed
will return records forchr1
,chr10
,chr11
, etc. andgrep chr2 input.bed
will return records forchr2
,chr20
, andchr21
, so the output for those search keys will contain more intervals than probably expected and is likely incorrect.That is correct. But it can be improved with -w option:
Example bed:
After execution:
This perfectly working thanks but if there are multiple files then how can parallel be used where each file need to be labeled with original file name and the chromosome name.
If original file name is "test.bed", and you want to append this to every chromosome file to be created, then command would be:
This would create multiple files with original name and the chromosome name.