Hi everyone,
I'm trying to better understand what to do with extra chromosomes in my data set are (see below). It seems like best practice is to remove extra-chr, but I'm having difficulty removing them using PLINK. I've tried to filter them out with --chr 1-22 and at another point using --remove, but I haven't had any luck. PLINK always returns the error that an extra chromosome was found.
Does anyone have advice on how to remove the extra chromosomes?
Example of the remove file:
head remove.txt
chr1_KI270707v1_random . 0 2277 CAT C chr1_KI270707v1_random . 0 2310 C T chr1_KI270707v1_random . 0 2327 C CAT chr1_KI270707v1_random . 0 2354 G A
Commands used:
plink2 --bfile KHO100 --remove remove.txt --make-bed --out cleanKHO
PLINK v2.00a SSE4.2 (28 Nov 2017)
www.cog-genomics.org/plink/2.0/ (C) 2005-2017 Shaun Purcell, Christopher Chang GNU General Public License v3 Logging to cleanKHO.log. Options in effect:
--bfile KHO100
--remove remove.txt --make-bed
--out cleanKHOStart time: Sat Dec 18 11:27:28 2021 257931 MB RAM detected; reserving 128965 MB for main workspace. Using up to 64 threads (change this with --threads). 99 samples (0 females, 0 males, 99 ambiguous; 82 founders) loaded from KHO100.fam.
Error: Invalid chromosome code 'chr1_KI270706v1_random' on line 27781233 of .pvar file. (Use --allow-extra-chr to force it to be accepted.) End time: Sat Dec 18 11:27:32 2021
The same error occurs if I try --chr 1-22.
Any advice and help is much appreciated!
look at
--not-chr
option to remove or try--allow-extra-chr --not-chr
to include those entries.`