parsing bed file
2
0
Entering edit mode
8.3 years ago
a.rex ▴ 350

Apologies if a similar question has been posed before - I was unable to find anything helpful.

I have a bed file that looks like the following:

 A    tss     tes     start   end     strand  category        B    tss     tes     start   end     strand  category        pair_direction
chr.1.0.loc.20379  22566   22816   22566   22816   +       non-rep chr.1.0.loc.20380  52719   53494   52719   53494   +       non-rep +

I wish to parse our any rows that contain 'non-rep' in any of the two 'category' columns. Does anyone have any suggestions as to how I would go about writing a python script/use other tools to enable me to do this?

parse python • 1.8k views
ADD COMMENT
1
Entering edit mode
8.3 years ago

To remove the lines containing non-rep:

grep -v "non-rep" file.bed > filtered.bed

To keep those lines, remove '-v' from the command.

EDIT: The first command will keep the header, but not the second. See @WouterDeCoster's post for keeping the header.

ADD COMMENT
1
Entering edit mode
8.3 years ago

I guess you meant you want "parse out" and as such remove all rows containing "non-rep"?

In that case, I would suggest grep -v "non-rep" yourfile.bed (you'll lose the header).

If you want to keep the rows containing "non-rep" (and the header) his could be fixed by doing something like cat <(head -n 1 yourfile.bed) <(grep "non-rep" yourfile.bed) > output.txt

ADD COMMENT

Login before adding your answer.

Traffic: 1945 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6