Question

parsing bed file

0

Entering edit mode

8.3 years ago

a.rex ▴ 350

Apologies if a similar question has been posed before - I was unable to find anything helpful.

I have a bed file that looks like the following:

 A    tss     tes     start   end     strand  category        B    tss     tes     start   end     strand  category        pair_direction
chr.1.0.loc.20379  22566   22816   22566   22816   +       non-rep chr.1.0.loc.20380  52719   53494   52719   53494   +       non-rep +

I wish to parse our any rows that contain 'non-rep' in any of the two 'category' columns. Does anyone have any suggestions as to how I would go about writing a python script/use other tools to enable me to do this?

parse python • 1.8k views

ADD COMMENT • link updated 8.3 years ago by WouterDeCoster 47k • written 8.3 years ago by a.rex ▴ 350

score 1 · Answer 1 · 2016-09-20

1

Entering edit mode

8.3 years ago

harold.smith.tarheel ★ 5.0k

To remove the lines containing non-rep:

grep -v "non-rep" file.bed > filtered.bed

To keep those lines, remove '-v' from the command.

EDIT: The first command will keep the header, but not the second. See @WouterDeCoster's post for keeping the header.

ADD COMMENT • link 8.3 years ago by harold.smith.tarheel ★ 5.0k

score 1 · Answer 2 · 2016-09-20

I guess you meant you want "parse out" and as such remove all rows containing "non-rep"?

In that case, I would suggest grep -v "non-rep" yourfile.bed (you'll lose the header).

If you want to keep the rows containing "non-rep" (and the header) his could be fixed by doing something like cat <(head -n 1 yourfile.bed) <(grep "non-rep" yourfile.bed) > output.txt