Subsetting individuals in PLINK when IDs contain underscores "_"
1
0
Entering edit mode
7.2 years ago

I am working with a VCF dataset and many individuals have IDs including an underscore.

I want to subset them using plink, but I keep getting this error:

Error: More than two instances of '_' in sample ID.

Is there a way for plink to ignore the underscores in those IDs and treat them as a single ID?

Thanks

This is an example of the IDs in my VCF file:

S_Eskimo_Sireniki-1.Sir26
plink • 3.7k views
ADD COMMENT
0
Entering edit mode
7.2 years ago
pfs ▴ 280

Plink documentation recommends converting underscores in an ID to a different character. Why not just use sed to change the underscores to another character?

Below if from PLINK. https://www.cog-genomics.org/plink2/input The family and within-family IDs default to 'FAM001' and 'ID001' respectively if you don't provide them. Due to how the PLINK 1 binary fileset format is defined, they cannot contain spaces3. Since some PLINK commands merge the family ID and within-family ID with an underscore in their reports, we recommend using another character (such as '~') to separate compound name components. (If you don't have to distinguish between e.g. 'Mac Donald' and 'MacDonald', upper CamelCase will also do.)

ADD COMMENT

Login before adding your answer.

Traffic: 1966 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6