Question

awk command

0

Entering edit mode

2.8 years ago

Fatemeh Nabizadeh ▴ 10

I am trying to filter out the Benign variants from my tsv file and 2 columns are having the verdict of pathogenicity. columns 23 and 29 (InterVar_automated and ClinSig, respectively).

The annotation for column 23 is as follows:

Benign

Likely benign

Likely pathogenic

Pathogenic

Uncertain significance

The annotation for column 29 is as follows:

Benign

Likely_benign

Likely_pathogenic

Pathogenic

Uncertain_significance

I can not use this command:

grep -iv benign 'fileName_or_filePath'

Because it is possible to miss a variant that is likely_benign based on ClinSig, but is VUS based on InterVar.

I want to use an awk command to say: "I do not need a variant if it is Benign or Likely benign based on column 23, AND also if it is Benign or Likely_benign based on column 29.

How can I do this?

WholeExomeSequencing Linux • 1.6k views

ADD COMMENT • link updated 2.8 years ago by cpad0112 21k • written 2.8 years ago by Fatemeh Nabizadeh ▴ 10

cpad0112 · Answer 1 · 2022-01-27

1

Entering edit mode

2.8 years ago

Pierre Lindenbaum 164k

: "I do not need a variant if it is Benign or Likely benign based on column 23, AND also if it is Benign or Likely_benign

awk '!(($23=="Benign" || $23=="Likely benign") && ($29=="Benign" || $29=="Likely benign"))'

ADD COMMENT • link 2.8 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Thank you, Mr. Lindenbaum

I tried this command

awk '!(($23=="Benign" || $23=="Likely benign") && ($29=="Benign" || $29=="Likely_benign"))' 2-Exonic > 3-NonBenign

But when getting the word count, the results are the same!

1353 2-Exonic 1353 3-NonBenign

Do you know where is the problem?

ADD REPLY • link 2.8 years ago by Fatemeh Nabizadeh ▴ 10

0

Entering edit mode

can you try this?

$ awk -F "\t" '!(($23=="Benign" || $23=="Likely benign") && ($29=="Benign" || $29=="Likely_benign"))' 2-Exonic

ADD REPLY • link 2.8 years ago by cpad0112 21k

0

Entering edit mode

So, this is my command:

awk -F "\t" '!(($23=="Benign" || $23=="Likely benign") && ($29=="Benign" || $29=="Likely_benign"))' 2-Exonic > 3-NonBenign

Here is the word count of my output file: 692 3-NonBenign

But has Benign and likely benign variants in columns 23 and 29.

ADD REPLY • link 2.8 years ago by Fatemeh Nabizadeh ▴ 10

0

Entering edit mode

we cannot second guess your data. Post example data where it is not getting filtered out.

ADD REPLY • link 2.8 years ago by cpad0112 21k

0

Entering edit mode

chr1    11826630        11826630        C       T       PASS    hom     exonic  C1orf167        .
       nonsynonymous SNV       C1orf167:NM_001010881:exon3:c.C787T:p.P263S     intergenic      AGTRAP;C1orf167 dist=15802;dist=5509    .       .       exonic  ENSG00000215910 .       nonsynonymous SNV
       ENSG00000215910:ENST00000433342:exon3:c.C1357T:p.P453S  Benign  .       . 

  .       .

This is an example of Benign in column 23, after using the command.

ADD REPLY • link updated 2.8 years ago by cpad0112 21k • written 2.8 years ago by Fatemeh Nabizadeh ▴ 10

1

Entering edit mode

Number of fields in the example line are lesser than 29
OP condition is that both columns (23 and 29) should not have the strings mentioned and conditions are joined by &&. If you do not want strings in either column, use || instead of &&.

If Benign and Likely Benign do not occur in any other column, you can do inverse grep or print rows that do not contain these strings (sed/awk).

ADD REPLY • link 2.8 years ago by cpad0112 21k