Question

How to analyze Exome Seq data for GWAS

0

Entering edit mode

4.9 years ago

Kumar ▴ 170

Hi, I have got exome seq data from Ion Torrent platform. I am looking to analyze these data for GWAS studies. Please suggest pipelines and workflow.

genome next-gen Exome GWAS • 2.3k views

ADD COMMENT • link updated 4.9 years ago by Kevin Blighe 89k • written 4.9 years ago by Kumar ▴ 170

score 0 · Answer 1 · 2020-08-12

0

Entering edit mode

4.9 years ago

Kevin Blighe 89k

Hey Manoj,

You could produce VCFs (or BCFs) and then import these to PLINK, where you could then perform standard GWAS analyses. There are 2 related answers, here: Convert vcf phased data to plink

If you have Ion Torrent data, I see no reason why you could not use GATK ( a related question, with no answer: NGS preprocessing pipleine for ion torrent data )

Kevin

ADD COMMENT • link 4.9 years ago by Kevin Blighe 89k

1

Entering edit mode

That is awesome. It is good for my initial starting and understanding the GWAS studies.

ADD REPLY • link 4.9 years ago by Kumar ▴ 170

0

Entering edit mode

Hi Kevin, Thank you for your answer. You mentioned that "standard GWAS analysis". Could you elaborate your answer, what would be standard GWAS analysis. I am pretty new in GWAS analysis.

ADD REPLY • link 4.9 years ago by Kumar ▴ 170

0

Entering edit mode

Hey Manoj, well, it really depends on what you want to do. Usually, people are studying some disease and they want to find inherited genetic variants that are statistically significantly associated with the disease. A typical GWAS is actually just a basic chi-squared test on a 2 x 2 contingency table, comprising the 'mutant' allele and reference allele tallies in the cases and controls. I go through the basic calculations for this here: A: SNP dataset and Z Score (technically, you don't need PLINK).

If you are studying a condition like arthritis, heart disease, or something along these lines, it's imperative that you have a solid study design and that your study is Powered (statistically) so that you can make inferences to the wider population.

ADD REPLY • link 4.9 years ago by Kevin Blighe 89k

0

Entering edit mode

Hi Kevin, I am catching again. I am planning to start my GWAS analysis. I am wondering which aligner I should use to generate BAM files such as HISAT2 or STAR. Subsequently, I would use GATK for generating VCFs.

ADD REPLY • link 4.8 years ago by Kumar ▴ 170

0

Entering edit mode

Hi, my apologies, you have DNA-seq data and you want to identify germline variants in this, correct?:

ADD REPLY • link 4.8 years ago by Kevin Blighe 89k

0

Entering edit mode

Hi, I have Exome Seq data of fibromyalgia disease in human and I am looking to find variants in this. I have total 60 samples.

ADD REPLY • link 4.8 years ago by Kumar ▴ 170

0

Entering edit mode

Okay, then, yes, you can follow the GATK

ADD REPLY • link 4.8 years ago by Kevin Blighe 89k

0

Entering edit mode

So, which aligner I should use to generate BAM files, HISAT2 or STAR? Before the GTAK.

ADD REPLY • link 4.8 years ago by Kumar ▴ 170

0

Entering edit mode

Neither. Use bwa mem if your reads are >=70bp in length. If shorter, use bowtie2.

ADD REPLY • link 4.8 years ago by Kevin Blighe 89k

0

Entering edit mode

I used STAR and kallisto for my RNA seq data analysis. Is there any specific reason to use bwa or bowtie for Exome seq analysis..

ADD REPLY • link 4.8 years ago by Kumar ▴ 170

0

Entering edit mode

Yes, the difference relates to the data-type: RNA- versus DNA-seq

ADD REPLY • link 4.8 years ago by Kevin Blighe 89k

0

Entering edit mode

Hi Kevin, I have got the vcf files of my samples (fibromyalgia disease affected and unaffected samples). Can you please let me know how to process next GWAS analysis? Do I need to use PLINK for the analysis? Could you suggest any workflow!

ADD REPLY • link 4.8 years ago by Kumar ▴ 170

0

Entering edit mode

Hi, I used PLINK for the vcfs, it generates .bed,.bim, .fam with the following command. Please let me know if I am doing correct.

./plink --vcf <filename> --keep-allele-order --allow-extra-chr 0 --make-bed --out <filename>

Next I used king with following command:

./king -b <filename.bed> --ibs <out-filename>

However, it is showing following ERROR: FATAL ERROR -

Too many first alleles as the major allele (~35.5%). Please use plink1.9 --make-bed to regenerate the genotype data again.

Furthermore, I tried following command:

./plink --bfile <filenames> --recode --tab --out <output filename>

ADD REPLY • link 4.8 years ago by Kumar ▴ 170

0

Entering edit mode

Hi, I have 90 VCF files. I am looking to use fastSTRUCTURE, please give some idea how to prepare input file for fastSTRUCTURE. I am new in this analysis. Initially, I got some idea that these VCF files need to be merged in one VCF file using BCFtools and then PLINK should be used to generate .bed file using the merged file. Please let me know if it is correct way.

ADD REPLY • link 4.4 years ago by Kumar ▴ 170