plink bim file stops at chrM
1
0
Entering edit mode
2.5 years ago
rturba ▴ 10

Hello! I am in need of some help. I have a VCF file that was generated using a reference genome where the chromosomes are named in roman numerals: chrI, chrII... chrM, chrV, etc. Which means that they are sorted alphabetically and not numerically, therefore my chromosomes have a silly order, with chrM listed in the middle for example (why lord!).

I've tried renaming them using only single digits and letters (1,2,3... M, X, Y) using bcftools annotate before I generate my bfiles using plink. The issue is that because my chrM was listed somewhere in the middle, when I try to make the bfiles, my BIM file stops when it reaches the M. This is the command I used:

bcftools norm -Ou -m -any $file.vcf.gz |
bcftools norm -Ou -f $ref |
bcftools annotate -Ob -x ID \
-I +'%CHROM:%POS:%REF:%ALT' |
plink --bcf /dev/stdin \
--keep-allele-order \
--const-fid \
--allow-extra-chr \
--make-bed \
--chr-set 24 \ #I also tried --output-chr M
--out $file

Is there a simple way to address this in the plink command? I'm trying to figure out a way to sort my VCF so the chrM is listed last also, but so far it has been a struggle and I must be thinking about this wrong! Ugh D:

plink chr bim • 2.5k views
ADD COMMENT
0
Entering edit mode
2.5 years ago

plink 1.x --make-bed automatically sorts the variants in an order that puts chrM at the end.

(This behavior was changed in plink 2.0; you can still sort with --make-bed --sort-vars, but if you don't include --sort-vars in the command line the original VCF order is preserved.)

ADD COMMENT
0
Entering edit mode

Hi @chrchang523, thanks for the reply! I'm using plink 1.9 but the variants are not being sorted. At least what I have noticed is that in the BIM file the program gives the chrM the last number, but the order remains the same and the file ends there, so I only have: 1 (chrI), 2 (chrII), 3 (chrIII), 4 (chrIV), 9 (chrIX), M (chrM). My species has 24 chromosomes total (including chrM).

ADD REPLY
0
Entering edit mode

Please post or send me a VCF file that illustrates what you're talking about, along with the plink .log file.

ADD REPLY
0
Entering edit mode

Hi, @chrchang523. I went to check my VCF file and I've noticed that after my chrM I had renamed my chrUn as 0 while my reference file had it named U. I think that instead of skipping this part, the whole thing just stopped there, so I'm re-running this to check. I'm running into --memory issues now, so when I'm done I'll get back here to clarify if the issue still persists.

ADD REPLY
0
Entering edit mode

OK, so it was my fault! I had the chrUn renamed differently on my VCF and REF file so I think that is solved. However, now I'm trying to update de FIDs using the --update-ids command and I'm getting the error: Invalid chromosome code '28' on line 40749796 of .bim file. Which is my chrM. Weird is that I did --set-chr 24. Hmmmm... now I think I understand the instructions of the --chr-set. So I define only the number for autosomes, and the rest the program will recognize automatically as X, Y and M? And will it be treating my data as human, even though I've defined a different set?

ADD REPLY
0
Entering edit mode

So, how would you advise I treat my chrUn (unassigned)? Currently it's named just as U. Should I assign it a number and treat it as an autosome?

ADD REPLY
1
Entering edit mode

That is what the --allow-extra-chr flag is for.

ADD REPLY
0
Entering edit mode

Awesome! Thank you so much for the help. When I defined --chr-set 20 #20 autosomal (excluding chrUn), and the --allow-extra-chr, I was able to run the --update-ids command with no error.

ADD REPLY
0
Entering edit mode

Actually, (sorry @chrchang523, this seems like a never ending issue!), I've just checked my BIM output and there seems to be an issue with chrX. The output is like so:

21  X:2937481:C:T   0   2937481 T   C
21  X:2937493:C:CT  0   2937493 CT  C
21  21:2937504:AT:A 0   2937504 A   AT
21  21:2937731:A:G  0   2937731 G   A
21  21:2937776:C:T  0   2937776 T   C

When I renamed my chromosomes, I did have a chr21 and a chrX. It seems they are being conflated. Is there a way to prevent this?

ADD REPLY
1
Entering edit mode

That's due to your incorrect use of --chr-set 20.

ADD REPLY
0
Entering edit mode

Hmmmmmmmmmmmm... It's because the way this genome was defined was that chr19 is the chrX. So I renamed from chrXIX to X and therefore it skips the 19 altogether. I was counting as 20 total. Thanks so much!

ADD REPLY

Login before adding your answer.

Traffic: 2217 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6