awk command
1
0
Entering edit mode
2.8 years ago

I am trying to filter out the Benign variants from my tsv file and 2 columns are having the verdict of pathogenicity. columns 23 and 29 (InterVar_automated and ClinSig, respectively).

The annotation for column 23 is as follows:

Benign

Likely benign

Likely pathogenic

Pathogenic

Uncertain significance

The annotation for column 29 is as follows:

Benign

Likely_benign

Likely_pathogenic

Pathogenic

Uncertain_significance

I can not use this command:

grep -iv benign 'fileName_or_filePath'

Because it is possible to miss a variant that is likely_benign based on ClinSig, but is VUS based on InterVar.

I want to use an awk command to say: "I do not need a variant if it is Benign or Likely benign based on column 23, AND also if it is Benign or Likely_benign based on column 29.

How can I do this?

WholeExomeSequencing Linux • 1.6k views
ADD COMMENT
1
Entering edit mode
2.8 years ago

: "I do not need a variant if it is Benign or Likely benign based on column 23, AND also if it is Benign or Likely_benign

awk '!(($23=="Benign" || $23=="Likely benign") && ($29=="Benign" || $29=="Likely benign"))'
ADD COMMENT
0
Entering edit mode

Thank you, Mr. Lindenbaum

I tried this command

awk '!(($23=="Benign" || $23=="Likely benign") && ($29=="Benign" || $29=="Likely_benign"))' 2-Exonic > 3-NonBenign

But when getting the word count, the results are the same!

1353 2-Exonic 1353 3-NonBenign

Do you know where is the problem?

ADD REPLY
0
Entering edit mode

can you try this?

$ awk -F "\t" '!(($23=="Benign" || $23=="Likely benign") && ($29=="Benign" || $29=="Likely_benign"))' 2-Exonic 
ADD REPLY
0
Entering edit mode

So, this is my command:

awk -F "\t" '!(($23=="Benign" || $23=="Likely benign") && ($29=="Benign" || $29=="Likely_benign"))' 2-Exonic > 3-NonBenign

Here is the word count of my output file: 692 3-NonBenign

But has Benign and likely benign variants in columns 23 and 29.

ADD REPLY
0
Entering edit mode

we cannot second guess your data. Post example data where it is not getting filtered out.

ADD REPLY
0
Entering edit mode
chr1    11826630        11826630        C       T       PASS    hom     exonic  C1orf167        .
       nonsynonymous SNV       C1orf167:NM_001010881:exon3:c.C787T:p.P263S     intergenic      AGTRAP;C1orf167 dist=15802;dist=5509    .       .       exonic  ENSG00000215910 .       nonsynonymous SNV
       ENSG00000215910:ENST00000433342:exon3:c.C1357T:p.P453S  Benign  .       . 

  .       .       

This is an example of Benign in column 23, after using the command.

ADD REPLY
1
Entering edit mode
  1. Number of fields in the example line are lesser than 29
  2. OP condition is that both columns (23 and 29) should not have the strings mentioned and conditions are joined by &&. If you do not want strings in either column, use || instead of &&.

If Benign and Likely Benign do not occur in any other column, you can do inverse grep or print rows that do not contain these strings (sed/awk).

ADD REPLY

Login before adding your answer.

Traffic: 2630 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6