Question

Error attaching phenotype in PLINK (fewer tokens than expected)

1

Entering edit mode

3.8 years ago

absoldini ▴ 10

I'm 100% new to Bioinformatics and terrible with computers (medical doctor). 'Currently working on a GWAS with the data from the Human Connectome Project (HCP). Running into some issues, please bear with me if my description of the issue isn*t optimal

Using PLINK

Already have the Phenotypes from the HCP webpage in .csv format

This is how my .fam file looks like

52259_82122 100004 52259 82122 1 -9
56037_85858 100206 56037 85858 1 -9
51488_81352 100307 51488 81352 2 -9
51730_81594 100408 51730 81594 1 -9
52813_82634 100610 52813 82634 1 -9
51283_52850_81149 101006 51283 81149 2 -9
51969_81833 101107 51969 81833 1 -9
51330_81195 101208 51330 81195 2 -9
52385_82248 101309 52385 82248 1 -9
52198_82061 101410 52198 82061 1 -9

This is how my phenotype .cvs file looks like

Subject,Age_in_Yrs,HasGT,ZygositySR,ZygosityGT,Family_ID,Mother_ID,Father_ID,TestRetestInterval,Race,Ethnicity,Handedness,SSAGA_Employ,SSAGA_Income,SSAGA_Educ,SSAGA_InSchool,SSAGA_Rlshp,SSAGA_MOBorn,Height,Weight,BMI,SSAGA_BMICat,SSAGA_BMICatHeaviest,Blood_Drawn,Hematocrit_1,Hematocrit_2,BPSystolic,BPDiastolic,ThyroidHormone,HbA1C,Hypothyroidism,Hypothyroidism_Onset,Hyperthyroidism,Hyperthyroidism_Onset,OtherEndocrn_Prob,OtherEndocrine_ProbOnset,Menstrual_RegCycles,Menstrual_Explain,Menstrual_AgeBegan,Menstrual_CycleLength,Menstrual_DaysSinceLast,Menstrual_AgeIrreg,Menstrual_AgeStop,Menstrual_MonthsSinceStop,Menstrual_UsingBirthControl,Menstrual_BirthControlCode,FamHist_Moth_Scz,FamHist_Fath_Scz,FamHist_Moth_Dep,FamHist_Fath_Dep,FamHist_Moth_BP,FamHist_Fath_BP,FamHist_Moth_Anx,FamHist_Fath_Anx,FamHist_Moth_DrgAlc,FamHist_Fath_DrgAlc,FamHist_Moth_Alz,FamHist_Fath_Alz,FamHist_Moth_PD,FamHist_Fath_PD,FamHist_Moth_TS,FamHist_Fath_TS,FamHist_Moth_None,FamHist_Fath_None,ASR_Anxd_Raw,ASR_Anxd_Pct,ASR_Witd_Raw,ASR_Witd_T,ASR_Soma_Raw,ASR_Soma_T,ASR_Thot_Raw,ASR_Thot_T,ASR_Attn_Raw,ASR_Attn_T,ASR_Aggr_Raw,ASR_Aggr_T,ASR_Rule_Raw,ASR_Rule_T,ASR_Intr_Raw,ASR_Intr_T,ASR_Oth_Raw,ASR_Crit_Raw,ASR_Intn_Raw,ASR_Intn_T,ASR_Extn_Raw,ASR_Extn_T,ASR_TAO_Sum,ASR_Totp_Raw,ASR_Totp_T,DSM_Depr_Raw,DSM_Depr_T,DSM_Anxi_Raw,DSM_Anxi_T,DSM_Somp_Raw,DSM_Somp_T,DSM_Avoid_Raw,DSM_Avoid_T,DSM_Adh_Raw,DSM_Adh_T,DSM_Inat_Raw,DSM_Hype_Raw,DSM_Antis_Raw,DSM_Antis_T,SSAGA_ChildhoodConduct,SSAGA_PanicDisorder,SSAGA_Agoraphobia,SSAGA_Depressive_Ep,SSAGA_Depressive_Sx,Color_Vision,Eye,EVA_Num,EVA_Denom,Correction,Breathalyzer_Over_05,Breathalyzer_Over_08,Cocaine,THC,Opiates,Amphetamines,MethAmphetamine,Oxycontin,Total_Drinks_7days,Num_Days_Drank_7days,Avg_Weekday_Drinks_7days,Avg_Weekend_Drinks_7days,Total_Beer_Wine_Cooler_7days,Avg_Weekday_Beer_Wine_Cooler_7days,Avg_Weekend_Beer_Wine_Cooler_7days,Total_Malt_Liquor_7days,Avg_Weekday_Malt_Liquor_7days,Avg_Weekend_Malt_Liquor_7days,Total_Wine_7days,Avg_Weekday_Wine_7days,Avg_Weekend_Wine_7days,Total_Hard_Liquor_7days,Avg_Weekday_Hard_Liquor_7days,Avg_Weekend_Hard_Liquor_7days,Total_Other_Alc_7days,Avg_Weekday_Other_Alc_7days,Avg_Weekend_Other_Alc_7days,SSAGA_Alc_D4_Dp_Sx,SSAGA_Alc_D4_Ab_Dx,SSAGA_Alc_D4_Ab_Sx,SSAGA_Alc_D4_Dp_Dx,SSAGA_Alc_12_Drinks_Per_Day,SSAGA_Alc_12_Frq,SSAGA_Alc_12_Frq_5plus,SSAGA_Alc_12_Frq_Drk,SSAGA_Alc_12_Max_Drinks,SSAGA_Alc_Age_1st_Use,SSAGA_Alc_Hvy_Drinks_Per_Day,SSAGA_Alc_Hvy_Frq,SSAGA_Alc_Hvy_Frq_5plus,SSAGA_Alc_Hvy_Frq_Drk,SSAGA_Alc_Hvy_Max_Drinks,Total_Any_Tobacco_7days,Times_Used_Any_Tobacco_Today,Num_Days_Used_Any_Tobacco_7days,Avg_Weekday_Any_Tobacco_7days,Avg_Weekend_Any_Tobacco_7days,Total_Cigarettes_7days,Avg_Weekday_Cigarettes_7days,Avg_Weekend_Cigarettes_7days,Total_Cigars_7days,Avg_Weekday_Cigars_7days,Avg_Weekend_Cigars_7days,Total_Pipes_7days,Avg_Weekday_Pipes_7days,Avg_Weekend_Pipes_7days,Total_Chew_7days,Avg_Weekday_Chew_7days,Avg_Weekend_Chew_7days,Total_Snuff_7days,Avg_Weekday_Snuff_7days,Avg_Weekend_Snuff_7days,Total_Other_Tobacco_7days,Avg_Weekday_Other_Tobacco_7days,Avg_Weekend_Other_Tobacco_7days,SSAGA_FTND_Score,SSAGA_HSI_Score,SSAGA_TB_Age_1st_Cig,SSAGA_TB_DSM_Difficulty_Quitting,SSAGA_TB_DSM_Tolerance,SSAGA_TB_DSM_Withdrawal,SSAGA_TB_Hvy_CPD,SSAGA_TB_Max_Cigs,SSAGA_TB_Reg_CPD,SSAGA_TB_Smoking_History,SSAGA_TB_Still_Smoking,SSAGA_TB_Yrs_Since_Quit,SSAGA_TB_Yrs_Smoked,SSAGA_Times_Used_Illicits,SSAGA_Times_Used_Cocaine,SSAGA_Times_Used_Hallucinogens,SSAGA_Times_Used_Opiates,SSAGA_Times_Used_Sedatives,SSAGA_Times_Used_Stimulants,SSAGA_Mj_Use,SSAGA_Mj_Ab_Dep,SSAGA_Mj_Age_1st_Use,SSAGA_Mj_Times_Used
101208,35,true,NotMZ,DZ,51330_81195,51330,81195,,Black or African Am.,Hispanic/Latino,100,2,8,17,0,1,1,63,133,23.56,1,1,1,37,39,115,76,0.85,5.5,0,,0,,0,,1,,15,2,27,,,,0,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,2,50,2,51,4,56,2,51,2,50,1,50,1,51,0,50,6,3,8,46,2,38,10,20,41,6,56,4,51,1,51,1,50,1,50,1,0,1,50,0,0,1,1,0,NORMAL,B,20,16,-2.5,false,false,false,false,false,false,false,false,0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,1,0,1,,,,,,,,,,,,0,0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,,,,,,,,,,0,0,,,0,0,0,0,0,0,0,0,,0

So from what I know until now, I have to attach a phenotype to the .fam file. So I try the following. Using age as an example phenotype

./plink --bfile genotypefile --pheno phenotype.csv --pheno-name Age_in_Yrs --make-bed --out filename

and this happens:

aldo@dell1:~/Desktop/PLINK$ ./plink --bfile MEGA_Chip --pheno rest.csv --pheno-name Age_In_Years --make-bed --out mergedage
PLINK v1.90b6.21 64-bit (19 Oct 2020)          www.cog-genomics.org/plink/1.9/
(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to mergedage.log.
Options in effect:
  --bfile MEGA_Chip
  --make-bed
  --out mergedage
  --pheno rest.csv
  --pheno-name Age_In_Years

32117 MB RAM detected; reserving 16058 MB for main workspace.
2119803 variants loaded from .bim file.
1141 people (523 males, 618 females) loaded from .fam.
Error: Line 1 of --pheno file has fewer tokens than expected.

So i'm stuck at this error (Line 1 of --pheno file has fewer tokens than expected.). Modifying the phenotype.csv is a non issue, the file is small. However, I can't open the .ped file because it too big (9.7GB) and my computer just dies trying to do so.

Somehow yesterday I managed to modify the phenotype.csv in a way that the error turned into Line 1 of --(.ped , i think) file has fewer tokens than expected. I seem to have deleted columns or shifted them so that they matched (FID IID).

Any help would be appreciated

Thanks! :)

GWAS PLINK • 6.7k views

ADD COMMENT • link 3.8 years ago by absoldini ▴ 10

1

Entering edit mode

I think you need space or tab separated file as pheno file, not comma, see pheno manual:

--pheno causes phenotype values to be read from the 3rd column of the specified space- or tab-delimited file,

ADD REPLY • link 3.8 years ago by zx8754 12k

0

Entering edit mode

Thanks for the reply!

Ok, I did what you suggested. corrected the pheno file to be tab delimited. After that, the error changed to Line 1 of --fam file has fewer tokens than expected.

So I decided to also change the .fam file to be tab delimited

then this happened.

Error: --pheno-name requires the --pheno file to have a header line with first two columns 'FID' and 'IID'

so I edited the fam and pheno files in a way that they both had matching FID/IID as the first 2 columns

.fam

52259_82122 100004  52259   82122   1   -9
56037_85858 100206  56037   85858   1   -9
51488_81352 100307  51488   81352   2   -9
51730_81594 100408  51730   81594   1   -9
52813_82634 100610  52813   82634   1   -9
51283_52850_81149   101006  51283   81149   2   -9

pheno:

FID IID FS_IntraCranial_Vol FS_BrainSeg_Vol FS_BrainSeg_Vol_No_VentFS_BrainSeg_Vol_No_Vent_Surf FS_LCort_GM_Vol FS_RCort_GM_Vol FS_TotCort_GM_Vol   FS_SubCort_GM_Vol   FS_Total_GM_Vol FS_SupraTentorial_Vol   FS_L_WM_Vol FS_R_WM_Vol FS_Tot_WM_Vol   FS_Mask_Vol a   
0   100004

and now I get this:

Options in effect:
  --bed MEGA_Chip.bed
  --bim MEGA_Chip.bim
  --fam MEGA_Chip.csv
  --make-bed
  --out mergedtry
  --pheno unrestricted.csv
  --pheno-name FS_BrainSeg_Vol_No_Vent

32117 MB RAM detected; reserving 16058 MB for main workspace.
2119803 variants loaded from .bim file.
1141 people (523 males, 618 females) loaded from .fam.
0 phenotype values present after --pheno.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 0 founders and 1141 nonfounders present.
Calculating allele frequencies... done.
Warning: 295883 het. haploid genotypes present (see mergedtry.hh ); many
commands treat these as missing.
Warning: Nonmissing nonmale Y chromosome genotype(s) present; many commands
treat these as missing.
Total genotyping rate is 0.995282.
2119803 variants and 1141 people pass filters and QC.
Note: No phenotypes present.
--make-bed to mergedtry.bed + mergedtry.bim + mergedtry.fam ... done.

The fam file is now .csv because I changed it to be tab delimited, but this should be an issue because I specified it in --fam

So the issue now is that it's not recognizing any phenotype.

This is how the end result .fam file looks like. With all the -9s of the missing phenotypes

aldo@dell1:~/Desktop/PLINK$ head mergedtry.fam
52259_82122 100004 52259 82122 1 -9
56037_85858 100206 56037 85858 1 -9
51488_81352 100307 51488 81352 2 -9
51730_81594 100408 51730 81594 1 -9
52813_82634 100610 52813 82634 1 -9

Thanks again for the help!!!

ADD REPLY • link 3.8 years ago by absoldini ▴ 10