Question

GWAS summary statistics

1

Entering edit mode

7.7 years ago

iti.gupta ▴ 10

Can anyone please elaborate what is the GWAS summary statistics? From where can we get it?

Gwas • 13k views

ADD COMMENT • link updated 5.3 years ago by Jeremy Leipzig 23k • written 7.7 years ago by iti.gupta ▴ 10

0

Entering edit mode

Hi, I wonder if you have any access to GWAS summary statistics now, except for GWAS catalog?

ADD REPLY • link 6.0 years ago by 2204130116 • 0

0

Entering edit mode

Please read the other answers.

ADD REPLY • link 6.0 years ago by Kevin Blighe 89k

score 3 · Answer 1 · 2017-08-31

'GWAS summary statistics' is a bit general. In GWAS studies, things that we look at include: Major allele, Minor allele, Minor allele frequency (MAF), Missingness per genotype, Missingness per individuals, etc.

Other, more advanced, I'd say, metrics that we look at include linkage disequilibrium (LD), variance inflation factor (VIF), runs of homozygosity (ROH), etc.

These provide a broad 'summary' of the data and allow us to appropriately set thresholds for quality control. It would be wrong, for example, to run a statistical test on a genotype with high missingness because the resulting P value would be misleading and could lead to erroneous conclusions from the data.

PLINK is usually the 'go to' program for analysing GWAS data, but there are other alternatives. It is also possible to read PLINK data into R and do your own analyses, but for now there are not many programs to do that.

Further information can be found here: http://zzz.bwh.harvard.edu/plink/summary.shtml

Kevin

score 1 · Answer 2 · 2017-08-31

1

Entering edit mode

7.7 years ago

Denise CS ★ 5.2k

This is the list of papers with summary statistics from the GWAS catalog. The summary stats themselves are available as a compressed file from their FTP. These statistics will (can) contain the odds ratio, beta coefficient, p-value and minor allele frequency, for example.

ADD COMMENT • link 7.7 years ago by Denise CS ★ 5.2k

score 1 · Answer 3 · 2020-01-09

GWAS summary statistics refer to supplying three important pieces of information: SNP, Phenotype, and P-value

This differs from full GWAS data which would have calls for every individual at every SNP

Here are the fields and an entry from the NHGRI-EBI GWAS Catalog:

    DATE ADDED TO CATALOG   PUBMEDID    FIRST AUTHOR    DATE    JOURNAL LINK    STUDY   DISEASE/TRAIT   INITIAL SAMPLE SIZE REPLICATION SAMPLE SIZE REGION  CHR_ID  CHR_POS REPORTED GENE(S)    MAPPED_GENE UPSTREAM_GENE_ID    DOWNSTREAM_GENE_ID  SNP_GENE_IDS    UPSTREAM_GENE_DISTANCE  DOWNSTREAM_GENE_DISTANCE    STRONGEST SNP-RISK ALLELE   SNPS    MERGED  SNP_ID_CURRENT  CONTEXT INTERGENIC  RISK ALLELE FREQUENCY   P-VALUE PVALUE_MLOG P-VALUE (TEXT)  OR or BETA  95% CI (TEXT)   PLATFORM [SNPS PASSING QC]  CNV
2008-06-16  17463249    Zeggini E   2007-04-26  Science www.ncbi.nlm.nih.gov/pubmed/17463249    Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Type 2 diabetes 1,924 European ancestry cases, 2,938 European ancestry controls 3,757 European ancestry cases, 5,346 European ancestry controls 16q12.2 16  53782363    FTO FTO         ENSG00000140718         rs8050136-A rs8050136   0   8050136 intron_variant  0   0.40    7E-14   13.154901959985743      1.23    [1.18-1.32] Affymetrix [393453] N