Apologies if a similar question has been posed before - I was unable to find anything helpful.
I have a bed file that looks like the following:
A tss tes start end strand category B tss tes start end strand category pair_direction
chr.1.0.loc.20379 22566 22816 22566 22816 + non-rep chr.1.0.loc.20380 52719 53494 52719 53494 + non-rep +
I wish to parse our any rows that contain 'non-rep' in any of the two 'category' columns. Does anyone have any suggestions as to how I would go about writing a python script/use other tools to enable me to do this?
I guess you meant you want "parse out" and as such remove all rows containing "non-rep"?
In that case, I would suggest grep -v "non-rep" yourfile.bed (you'll lose the header).
If you want to keep the rows containing "non-rep" (and the header) his could be fixed by doing something like cat <(head -n 1 yourfile.bed) <(grep "non-rep" yourfile.bed) > output.txt