Dear all.
I do not know if a tool already exists but I would like to do the following steps in a tab delimited file:
The file is as follows:
chr1 0 600 15 Repetitive/CNV 0 . 0 600 245,245,245
chr1 1000 1600 8 Insulator 0 . 1000 1600 10,190,254
chr1 100004000 100005200 2 Weak Promoter 0 . 100004000 100005200 255,105,105
chr1 100005200 100016800 13 Heterochrom/lo 0 . 100005200 100016800 245,245,245
chr1 10001600 10014800 13 Heterochrom/lo 0 . 10001600 10014800 245,245,245
chr1 100016800 100022800 12 Repressed 0 . 100016800 100022800 127,127,127
chr1 100022800 100026800 13 Heterochrom/lo 0 . 100022800 100026800 245,245,245
chr1 100026800 100028600 12 Repressed 0 . 100026800 100028600 127,127,127
chr1 100028600 100037000 13 Heterochrom/lo 0 . 100028600 100037000 245,245,245
chr1 100037000 100046600 12 Repressed 0 . 100037000 100046600 127,127,127
chr1 100046600 100046800 6 Weak Enhancer 0 . 100046600 100046800 255,252,4
chr1 100046800 100047000 2 Weak Promoter 0 . 100046800 100047000 255,105,105
chr1 100047000 100047200 4 Strong Enhancer 0 . 100047000 100047200 250,202,0
chr1 100047200 100047400 6 Weak Enhancer 0 . 100047200 100047400 255,252,4
chr1 100047400 100054200 13 Heterochrom/lo 0 . 100047400 100054200 245,245,245
chr1 100054200 100055000 12 Repressed 0 . 100054200 100055000 127,127,127
chr1 100055000 100087400 13 Heterochrom/lo 0 . 100055000 100087400 245,245,245
chr1 100087400 100087600 6 Weak Enhancer 0 . 100087400 100087600 255,252,4
first I would like to remove the number space before the characterization of the area: 6 Weak Enchancer ---> Weak Enchancer second to count all Weak enchancer or other identical fields of row 4 and print something like the following: Weak Enchancer 4 Heterochrom 20 . . . I tried: sort 'file.bed' | awk '{print $4}' | uniq -c -D -i
or sort 'file.bed' | uniq -c -D -i
with no avail. Any help will be higly appreciated
I should state that I want to do it as easily as possible, I have no real skills in programming and even if openoffice can do it I'm fine with that!!!
Thank you in advance
Theodore
I've run the pipeline, it works great, although I get the following:
it seems as if
sed
had replaced spaces (\s
) with underscore (_
)???It shouldn't do that, at least not unless you changed it to be something like
sed 's/ /_/'
.