Question

Splitting bed file chromosome wise into a text file with chromosome name

0

Entering edit mode

8.9 years ago

startup_biostar ▴ 20

I have a following type of bed file and I would like to convert this giant bed file chromosome wise into number of files with each of its name like chromosome1.txt, chr2.txt etc..

I know that I can deal with the command line. But is there any specific tool that does this job.? New to bioinformatics.

chr1 3000362 3000437 HWI-D00249:1648:BHT7HFBCXX:1:1211:14232:51942 255 +

chr1 3000656 3000731 HWI-D00249:1648:BHT7HFBCXX:2:2111:6651:57733 255 +

I have the following commandline code but I need a tool

awk '{print $1 }' ./temp2.bed | uniq | while read chr

do

#isolate rows with each chr, and then write the rows on each chr files.

sed -n /${chr}[[:blank:]]/p ./temp2.bed > ./BedFiles/${1}_${chr}.txt

done

bedtools bed split • 6.7k views

ADD COMMENT • link updated 8.9 years ago by Alex Reynolds 36k • written 8.9 years ago by startup_biostar ▴ 20

score 3 · Answer 1 · 2016-08-10

You can use bedextract --list-chr in BEDOPS to build a list of chromosomes much, much faster than awk or cut | sort | uniq (like several orders of magnitude faster).

Then you can loop through that list and use bedextract again to quickly split the input into separate files.

First sort the input file with BEDOPS sort-bed, if unsorted. It is faster than Unix sort and you only need to do this once.

$ sort-bed input.unsorted.bed > input.bed

Then use bedextract to split the input file. In a bash shell, you could do the following, for example:

$ for chr in `bedextract --list-chr input.bed`; do bedextract $chr input.bed > input.$chr.bed; done

Each of the files input.*.bed contains the elements for that chromosome. You can adjust filenames and paths in this one-liner, as needed, if you need to follow some pattern for downstream work.