Can anyone please elaborate what is the GWAS summary statistics? From where can we get it?
Can anyone please elaborate what is the GWAS summary statistics? From where can we get it?
'GWAS summary statistics' is a bit general. In GWAS studies, things that we look at include: Major allele, Minor allele, Minor allele frequency (MAF), Missingness per genotype, Missingness per individuals, etc.
Other, more advanced, I'd say, metrics that we look at include linkage disequilibrium (LD), variance inflation factor (VIF), runs of homozygosity (ROH), etc.
These provide a broad 'summary' of the data and allow us to appropriately set thresholds for quality control. It would be wrong, for example, to run a statistical test on a genotype with high missingness because the resulting P value would be misleading and could lead to erroneous conclusions from the data.
PLINK is usually the 'go to' program for analysing GWAS data, but there are other alternatives. It is also possible to read PLINK data into R and do your own analyses, but for now there are not many programs to do that.
Further information can be found here: http://zzz.bwh.harvard.edu/plink/summary.shtml
Kevin
This is the list of papers with summary statistics from the GWAS catalog. The summary stats themselves are available as a compressed file from their FTP. These statistics will (can) contain the odds ratio, beta coefficient, p-value and minor allele frequency, for example.
GWAS summary statistics refer to supplying three important pieces of information: SNP, Phenotype, and P-value
This differs from full GWAS data which would have calls for every individual at every SNP
Here are the fields and an entry from the NHGRI-EBI GWAS Catalog:
DATE ADDED TO CATALOG PUBMEDID FIRST AUTHOR DATE JOURNAL LINK STUDY DISEASE/TRAIT INITIAL SAMPLE SIZE REPLICATION SAMPLE SIZE REGION CHR_ID CHR_POS REPORTED GENE(S) MAPPED_GENE UPSTREAM_GENE_ID DOWNSTREAM_GENE_ID SNP_GENE_IDS UPSTREAM_GENE_DISTANCE DOWNSTREAM_GENE_DISTANCE STRONGEST SNP-RISK ALLELE SNPS MERGED SNP_ID_CURRENT CONTEXT INTERGENIC RISK ALLELE FREQUENCY P-VALUE PVALUE_MLOG P-VALUE (TEXT) OR or BETA 95% CI (TEXT) PLATFORM [SNPS PASSING QC] CNV
2008-06-16 17463249 Zeggini E 2007-04-26 Science www.ncbi.nlm.nih.gov/pubmed/17463249 Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Type 2 diabetes 1,924 European ancestry cases, 2,938 European ancestry controls 3,757 European ancestry cases, 5,346 European ancestry controls 16q12.2 16 53782363 FTO FTO ENSG00000140718 rs8050136-A rs8050136 0 8050136 intron_variant 0 0.40 7E-14 13.154901959985743 1.23 [1.18-1.32] Affymetrix [393453] N
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hi, I wonder if you have any access to GWAS summary statistics now, except for GWAS catalog?
Please read the other answers.