Entering edit mode
4.3 years ago
Kumar
▴
170
Hi, I have got exome seq data from Ion Torrent platform. I am looking to analyze these data for GWAS studies. Please suggest pipelines and workflow.
That is awesome. It is good for my initial starting and understanding the GWAS studies.
Hi Kevin, Thank you for your answer. You mentioned that "standard GWAS analysis". Could you elaborate your answer, what would be standard GWAS analysis. I am pretty new in GWAS analysis.
Hey Manoj, well, it really depends on what you want to do. Usually, people are studying some disease and they want to find inherited genetic variants that are statistically significantly associated with the disease. A typical GWAS is actually just a basic chi-squared test on a 2 x 2 contingency table, comprising the 'mutant' allele and reference allele tallies in the cases and controls. I go through the basic calculations for this here: A: SNP dataset and Z Score (technically, you don't need PLINK).
If you are studying a condition like arthritis, heart disease, or something along these lines, it's imperative that you have a solid study design and that your study is Powered (statistically) so that you can make inferences to the wider population.
Hi Kevin, I am catching again. I am planning to start my GWAS analysis. I am wondering which aligner I should use to generate BAM files such as HISAT2 or STAR. Subsequently, I would use GATK for generating VCFs.
Hi, my apologies, you have DNA-seq data and you want to identify germline variants in this, correct?:
Hi, I have Exome Seq data of fibromyalgia disease in human and I am looking to find variants in this. I have total 60 samples.
Okay, then, yes, you can follow the GATK
So, which aligner I should use to generate BAM files, HISAT2 or STAR? Before the GTAK.
Neither. Use
bwa mem
if your reads are >=70bp in length. If shorter, usebowtie2
.I used STAR and kallisto for my RNA seq data analysis. Is there any specific reason to use bwa or bowtie for Exome seq analysis..
Yes, the difference relates to the data-type: RNA- versus DNA-seq
Hi Kevin, I have got the vcf files of my samples (fibromyalgia disease affected and unaffected samples). Can you please let me know how to process next GWAS analysis? Do I need to use PLINK for the analysis? Could you suggest any workflow!
Hi, I used PLINK for the vcfs, it generates .bed,.bim, .fam with the following command. Please let me know if I am doing correct.
Next I used king with following command:
However, it is showing following ERROR: FATAL ERROR -
Too many first alleles as the major allele (~35.5%). Please use plink1.9 --make-bed to regenerate the genotype data again.
Furthermore, I tried following command:
Hi, I have 90 VCF files. I am looking to use fastSTRUCTURE, please give some idea how to prepare input file for fastSTRUCTURE. I am new in this analysis. Initially, I got some idea that these VCF files need to be merged in one VCF file using BCFtools and then PLINK should be used to generate .bed file using the merged file. Please let me know if it is correct way.