Hi, I am very new to this area, and I am taking a class about bioinformatics. For an independent project assignment, I need to do a GWAS. I am using the bash terminal. I downloaded all the fastq I need, trimmed them, and converted them into bam/sam then vcf then bed/bim/fam etc. However, when I tried to perform GWAS in plink, I realized I dont have phenotype data. It supposed to have two phenotypes.
Basically there are two groups/phenotypes of fastq files, each containing 29 samples. Let's say they are group 1 and 2. For each group, I converted every fastq to sam then bam, then I combined 29 bam to one bam. Then I combined two bams (for the two groups) together to a vcf.gz. Then there is no phenotype data in the following plink files.
Would really appreciate any help! like which step I might have been wrong, or what I should do to incorporate the phenotype data. Ultimately this is only an assignment, so I dont have to be perfect at every detail (like the QC steps), and I am afraid I cannot understand too complicated codes. I just want to go to the end and get a Manhattan plot or something. If there is another pipeline to do so that's also fine.
Cross-post https://bioinformatics.stackexchange.com/questions/20071/urgent-help-needed-with-gwas-and-vcf-files-lacking-phenotype
Please don't put 'urgent' in all caps. Your question is no more important than anyone else's. The error is that you combined the .bams prior to variant calling. I think you should have called variants separately for each sample and then run a GWAS on those variants.
Sorry for the confusion and wording, and thank you so much for the response! I see your point, so I will try to create vcf files for the two groups seperately. What should I do after that? Is there a way to run plink with two vcf files? Or how should I combine the two vcf while incorporating the phenotypes?