Stats on WGS variant calls
1
0
Entering edit mode
5.5 years ago
abbysue ▴ 10

What's the best way to collect some statistics on 4 mil variants identified with GATK pipeline (and annotated with Funcotator)? I'm having trouble parsing info in the 'Funcotator' annotation since it contains several pieces of info that I want, separated by pipe characters. I want to know # of Intronic vs Exonic, # Missense and Nonsense

Sorry, here's what one row looks like

chrY    56844194        G       GAT     2390.06 PASS    2       1.00    2       NA      NA      57      3.0103  0.000  [Unknown|hg38|chrY|56844194|56844195|IGR||INS|-|-|AT|g.chrY:56844194_56844195insAT|no_transcript|||||||0.4675|GTATTGTGAGATCTCTGCAC|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||hg38|OREG1947633|Type_%3D_TRANSCRIPTION_%20_FACTOR_%20_BINDING_%20_SITE_%7C_Gene_Symbol_%3D_CTBP2P1_%7C_Gene_ID_%3D_ENSG00000235857_%7C_Gene_Source_%3D_ENSEMBL_%7C_Regulatory_Element_Symbol_%3D_ZNF263_%7C_Regulatory_Element_ID_%3D_ENST00000219069_%7C_Regulatory_Element_Source_%3D_ENSEMBL_%7C_PMID_%3D_18971253_%7C_Dataset_%3D_PAZAR||||||||||||||||||||||||||false|false||false|false||false|false|false||false|false|false|false|false|false|false|false|false|false|false|false|false|false|false|false|false|false|false|false|||false|false||false||false||false|false|false||false|||false|||] 21.00    59.11   NA      true    NA      36.53   199154.00       NA      0.808

Here's a link that explains Funcotator annotations

I'm new to using the terminal, but I imagine this can be done using grep -w to look for a specific string (intron, exon, ...).

EDIT - I solved this with grep -w string file.table > file.txt

WGS variants • 1.0k views
ADD COMMENT
1
Entering edit mode
5.5 years ago
reza.jabal ▴ 580

Welcome to the world of bash scripting! Lets have your Funcotator results in Funcotator.txt:

For missense count: grep -w 'MISSENSE' Funcotator.txt | wc -l

For nonsense count: grep -w 'NONSENSE' Funcotator.txt | wc -l

For intronic count: grep -w 'INTRON' Funcotator.txt | wc -l

For exonic count: (grep -v '#' | wc -l) - (grep -w 'no_transcript' Funcotator.txt | wc -l)

ADD COMMENT

Login before adding your answer.

Traffic: 1756 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6