I'm trying to only print lines in a GTF file with the "tag" field "appris_principal" AND if that tag doesn't exist, then the ones tagged with "appris_candidate_longest" are selected, for any given gene.
I think I can code it up in python but there must be a way to do it in awk?
Why not
grep
? That might be the easiest and the quickest.Oh yeah let's not forget grep. But I'm not sure how to make the condition if appris_principal doesn't exist in this line, check whether appris_candidate_longest exists. I neither, don't print.
Extract matching lines:
Extract non-matching lines:
... ...
Check for lines that have appris_candidate_longest AND NOT appris_principal:
honestly I would just do it in Python. Use
csv.DictReader
. Shouldnt take more than a dozen lines.