I am trying to filter out the Benign variants from my tsv file and 2 columns are having the verdict of pathogenicity. columns 23 and 29 (InterVar_automated and ClinSig, respectively).
The annotation for column 23 is as follows:
Benign
Likely benign
Likely pathogenic
Pathogenic
Uncertain significance
The annotation for column 29 is as follows:
Benign
Likely_benign
Likely_pathogenic
Pathogenic
Uncertain_significance
I can not use this command:
grep -iv benign 'fileName_or_filePath'
Because it is possible to miss a variant that is likely_benign based on ClinSig, but is VUS based on InterVar.
I want to use an awk command to say: "I do not need a variant if it is Benign or Likely benign based on column 23, AND also if it is Benign or Likely_benign based on column 29.
How can I do this?
Thank you, Mr. Lindenbaum
I tried this command
awk '!(($23=="Benign" || $23=="Likely benign") && ($29=="Benign" || $29=="Likely_benign"))' 2-Exonic > 3-NonBenign
But when getting the word count, the results are the same!
1353 2-Exonic 1353 3-NonBenign
Do you know where is the problem?
can you try this?
So, this is my command:
awk -F "\t" '!(($23=="Benign" || $23=="Likely benign") && ($29=="Benign" || $29=="Likely_benign"))' 2-Exonic > 3-NonBenign
Here is the word count of my output file: 692 3-NonBenign
But has Benign and likely benign variants in columns 23 and 29.
we cannot second guess your data. Post example data where it is not getting filtered out.
This is an example of Benign in column 23, after using the command.
||
instead of&&
.If Benign and Likely Benign do not occur in any other column, you can do inverse grep or print rows that do not contain these strings (sed/awk).