GWAS summary statistics
3
1
Entering edit mode
7.2 years ago
iti.gupta ▴ 10

Can anyone please elaborate what is the GWAS summary statistics? From where can we get it?

Gwas • 13k views
ADD COMMENT
0
Entering edit mode

Hi, I wonder if you have any access to GWAS summary statistics now, except for GWAS catalog?

ADD REPLY
0
Entering edit mode

Please read the other answers.

ADD REPLY
3
Entering edit mode
7.2 years ago

'GWAS summary statistics' is a bit general. In GWAS studies, things that we look at include: Major allele, Minor allele, Minor allele frequency (MAF), Missingness per genotype, Missingness per individuals, etc.

Other, more advanced, I'd say, metrics that we look at include linkage disequilibrium (LD), variance inflation factor (VIF), runs of homozygosity (ROH), etc.

These provide a broad 'summary' of the data and allow us to appropriately set thresholds for quality control. It would be wrong, for example, to run a statistical test on a genotype with high missingness because the resulting P value would be misleading and could lead to erroneous conclusions from the data.

PLINK is usually the 'go to' program for analysing GWAS data, but there are other alternatives. It is also possible to read PLINK data into R and do your own analyses, but for now there are not many programs to do that.

Further information can be found here: http://zzz.bwh.harvard.edu/plink/summary.shtml

Kevin

ADD COMMENT
1
Entering edit mode
7.2 years ago
Denise CS ★ 5.2k

This is the list of papers with summary statistics from the GWAS catalog. The summary stats themselves are available as a compressed file from their FTP. These statistics will (can) contain the odds ratio, beta coefficient, p-value and minor allele frequency, for example.

ADD COMMENT
1
Entering edit mode
4.9 years ago

GWAS summary statistics refer to supplying three important pieces of information: SNP, Phenotype, and P-value

This differs from full GWAS data which would have calls for every individual at every SNP

Here are the fields and an entry from the NHGRI-EBI GWAS Catalog:

    DATE ADDED TO CATALOG   PUBMEDID    FIRST AUTHOR    DATE    JOURNAL LINK    STUDY   DISEASE/TRAIT   INITIAL SAMPLE SIZE REPLICATION SAMPLE SIZE REGION  CHR_ID  CHR_POS REPORTED GENE(S)    MAPPED_GENE UPSTREAM_GENE_ID    DOWNSTREAM_GENE_ID  SNP_GENE_IDS    UPSTREAM_GENE_DISTANCE  DOWNSTREAM_GENE_DISTANCE    STRONGEST SNP-RISK ALLELE   SNPS    MERGED  SNP_ID_CURRENT  CONTEXT INTERGENIC  RISK ALLELE FREQUENCY   P-VALUE PVALUE_MLOG P-VALUE (TEXT)  OR or BETA  95% CI (TEXT)   PLATFORM [SNPS PASSING QC]  CNV
2008-06-16  17463249    Zeggini E   2007-04-26  Science www.ncbi.nlm.nih.gov/pubmed/17463249    Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Type 2 diabetes 1,924 European ancestry cases, 2,938 European ancestry controls 3,757 European ancestry cases, 5,346 European ancestry controls 16q12.2 16  53782363    FTO FTO         ENSG00000140718         rs8050136-A rs8050136   0   8050136 intron_variant  0   0.40    7E-14   13.154901959985743      1.23    [1.18-1.32] Affymetrix [393453] N
ADD COMMENT
0
Entering edit mode

Hi, I was converting the Ors to Betas in this data frame recently, and felt really time consuming. Was wondering if you have any better approaches to handle this kind stuff? Thanks.

ADD REPLY
0
Entering edit mode

please post a new Biostars question

ADD REPLY

Login before adding your answer.

Traffic: 1779 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6