I have a huge file in bedfile format and I have to extract only the chr22 using the bedtools. I tried using the sort option but I don't understand how to do it ?
I have a huge file in bedfile format and I have to extract only the chr22 using the bedtools. I tried using the sort option but I don't understand how to do it ?
You can either explicitly list the files:
grep -h "chr22" A.bed B.bed C.bed > Result.bed
or, use a wildcard, which uses all the files ending with ".bed" in the current directory:
grep -h "chr22" *.bed > Result.bed
Don't forget to coordinate sort the BED file afterwards, as many programs require this:
sort -k1,1 -k2,2n Result.bed > Result.sorted.bed
If you're not averse to using BEDOPS, generate the sorted union of N BED files with sort-bed
, and use bedextract
to pull out elements of the chromosome-of-interest from the set union:
$ sort-bed A.bed B.bed ... N.bed > all.bed
$ bedextract chr22 all.bed > chr22.bed
Our BEDOPS bedextract
application uses a binary search approach to jump to the start position of the chromosome-of-interest, and so extraction is much faster than grep
or awk
, which have to waste time reading through the entire file.
For multi-GB, whole-genome scale files, and especially for extraction of elements at the end of a file, using awk
or grep
to read through the entire file can be (is) a significant waste of time. Even more so if you have to repeat the extraction for other chromosomes.
The output of BEDOPS tools will be sorted, as well, so it will be ready to use for downstream set operations.
How do I create a tab delineated file using coverage Bed options ?
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
You tried in terminal?
I tried this but I have 6 files and I need to store the chr22 from all the files in one file
cat fileB.bed fileC.bed fileD.bed > all_chr22_files.bed
but the options below works as well