Hi all,
Do you know the difference between sortBed and sort ?theses two give different results.
cat file.sort.bed |uniq|wc -l
cat file.bed |sort|uniq|wc -l
Thanks
Hi all,
Do you know the difference between sortBed and sort ?theses two give different results.
cat file.sort.bed |uniq|wc -l
cat file.bed |sort|uniq|wc -l
Thanks
Just guessing... sortBed may not break ties once it sorts by chrom, start end. I.e. duplicate lines having the same coordinates stay unsorted and uniq
count them more than once. Unix sort
by default sorts by additional fields to break ties. For example, given this file:
a 1
a 2
a 1
b 1
b 2
b 1
Unix sort without breaking ties (what sortBed might do):
sort -k1,1 -s test.txt | uniq | wc -l
6
Now with default, breaking ties:
sort -k1,1 test.txt | uniq | wc -l
4
Use sort-bed
to sort BED files on all three relevant fields.
It runs on arbitrary BED input — it handles input with headers, for instance, or input with more than six columns — and it runs faster than Unix sort
.
Then you don't have to worry about these issues!
linux sorting allows alphanumeric sorting as well now. You have to use the option V. If you want your bed files to be sorted chromosome wise then by region, use sort -k1,1V -k2,2n in.bed > out.sorted.bed. Without that option, sort will not perform an alphanumeric sort.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
There are many differences. If you specify exact commands you used, it would be easier to figure out why they are different for you.
for the first line I sort my bed file by sortbed from bedtools and then I used uniq and then count the lines
1) sortBed [OPTIONS] -i <bed gff="" vcf=""> 2) uniq 3) wc -l
while in second command I used directly bed file then used sort in unix script,uniq and finally count the lines.
You probably just did normal
sort
rather thansort -k1,1 k2,2n
, which more similar to bedtools.A bed file is in general a binary file and sort or cat on that file directly will probably not give you anything meaningful.
A bed file is not binary.
http://useast.ensembl.org/info/website/upload/bed.html
Ups, sorry, my bad, did not read title/post with enough care!
But to justify my answer:
http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#bed
Whoa! Binary ped. I guess there are at least two beds in genomics then.
Then I would recommend you read https://genome.ucsc.edu/FAQ/FAQformat.html because bed file is three columns file format which includes first column as chromosome name,second and third as start and end sites of interest regions.
This was discussed in a recent thread: Bedtools sortBed | uniq and bash sort | uniq returns different number of lines
Unix's sort will not handle any headers of the bed file. It won't be able to handle properly any compressed form of bed, like bedgraph or bgzipp-ed bed. It would require to use the correct -n and V options to properly sort fields as numeric or characters.
Unix's sort will not handle any headers of the bed file. It won't be able to handle properly any compressed form of bed, like bedgraph or bgzipp-ed bed. It would require to use the correct -n and V options to properly sort fields as numeric or characters.