Question

Cannot remove subjects from Plink files

1

Entering edit mode

8.1 years ago

Sheila ▴ 460

Hi! I have a Plink gwas data set. All files are in binary format (.bed,.bim).

Before I get into details, here is my question: How can I subset for subjects I am interested in using Plink?

I am aiming to subset a group of individuals for my analysis, however when I run the following command my plink log file has the following output. I'm also including what the include subjects file looks like as well:

PLINK COMMAND:

 plink --bfile /path/to/plink/files  --keep /path/to/includesubjects.txt --make-bed --out /path/to/subsetted/plink/files

EXAMPLE OF TEXT FILE WITH INDIVIDUALS I WANT TO INCLUDE: The '0' correspond to the fam ID. These are unrelated subjects so I did not include a family ID. This also corresponds to what is in the fam file.

 0     Subject_001
 0     Subject_002
 0     Subject_003
 0     Subject_004
 0     Subject_005
 0     Subject_006

PLINK OUTPUT: Writing this text to log file [ /path/to/log/file.log ] Analysis started: Wed Apr 19 15:36:22 2017

 Options in effect:
     --bfile /path/to/plink/files
     --keep /path/to/includesubjects.txt
     --make-bed
     --out /path/to/subsetted/plink/files

 Reading map (extended format) from [ /path/to/plink/files.bim ] 
 5 markers to be included from [ /path/to/plink/files.bim ]
 Reading pedigree information from [ /path/to/plink/files.fam ] 
 940 individuals read from [ /path/to/plink/files.fam ] 
 0 individuals with nonmissing phenotypes
 Assuming a disease phenotype (1=unaff, 2=aff, 0=miss)
 Missing phenotype value is also -9
 0 cases, 0 controls and 940 missing
 559 males, 381 females, and 0 of unspecified sex
 Reading genotype bitfile from [ /path/to/plink/files.bed ] 
 Detected that binary PED file is v1.00 SNP-major mode
 Reading individuals to keep [ path/to/includesubjects.txt ] ... 0 read
 940 individuals removed with --keep option
 Before frequency and genotyping pruning, there are 5 SNPs
 0 founders and 0 non-founders found
 Total genotyping rate in remaining individuals is 0
 0 SNPs failed missingness test ( GENO > 1 )
 0 SNPs failed frequency test ( MAF < 0 )
 After frequency and genotyping pruning, there are 5 SNPs

gwas plink filtering • 6.6k views

ADD COMMENT • link 8.1 years ago by Sheila ▴ 460

0

Entering edit mode

The only explanation I can think of is that maybe the file format e.g. the spacing is different in the keep file or that plink cannot find the file... Try to extract a few lines from your map file and make a new includesubjects.txt with some random samples based on that and try it again.

ADD REPLY • link 8.1 years ago by Floris Brenk ★ 1.0k

0

Entering edit mode

Hi Floris! Thanks for your note. I actually figured it out :) I subsetted the existing fam file for the subjects I was interested in and then used that as the input for the --keep command. Thanks for your help though!

ADD REPLY • link 8.1 years ago by Sheila ▴ 460

0

Entering edit mode

Hi Sheila, could you please explain how you did it (with command lines)? Thanks !

ADD REPLY • link 7.6 years ago by Mr Locuace ▴ 180

0

Entering edit mode

Hi Sheila, I am trying to remove ID patients from my data and I am using the original PED file for doing that. I create a .txt file with the number of ID family and ID patients that I want to remove put in two columns, but it still doesn't work. The analysis seems to go until the end of the process (creating temporary files) when appears the message saying: Error: duplicates ID.

My command is: $ ./plink --file name --remove IDlist.txt --out subset2 --make-bed

And my IDlist.txt is:

1 2204
2 1146

So I know I have few duplicates but I don't understand why the presence of duplicates does not allow the removing process.

How did you sort out your problem? Do you mind explaining here?

ADD REPLY • link 7.2 years ago by Ginevra ▴ 10

0

Entering edit mode

Hi@Vale, There are a couple things I'd check/try:

Check to make sure there is a single space as opposed to a tab between the FID and IID columns in IDlist.txt
Rename the duplicate IIDs to something like 2204DUP so that way you have unique subject IDs.

I hope that helps!

ADD REPLY • link 7.2 years ago by Sheila ▴ 460