Question

Removing specific line number in Plink Map/Ped files

0

Entering edit mode

3.3 years ago

zoe.bell • 0

Hello,

I used the Michigan Imputation Server to impute my dataset, which unfortunately created a lot of random duplicates that have the exact same information (down to the alleles), but are listed as two separate rows with different allele frequencies, R2 values etc.

I converted the results to a ped/map file that has instances of these exact duplicates, down to the allele. For example,

22:17996285:A:ATCTC
22:17996285:A:ATCTC

I need to only keep one of these duplicates in order to move on with my analysis, however the rm-dup command on Plink 2.0 does not have any options that allows me to specifically select which one I want to keep (the force-first option does not work because I do not always want to keep the first one depending on the minor allele frequency).

I do know the line numbers of these rows in the map file. Is there a way in Plink to delete by line number, for example deleting the first one in this example, which is line number 11092 in my file?

If not, is there a way to do this manually?

Thank you

map Plink1.9 ped Plink Plink2.0 • 958 views

ADD COMMENT • link updated 3.3 years ago by chrchang523 11k • written 3.3 years ago by zoe.bell • 0

score 0 · Answer 1 · 2022-04-13

You're asking the wrong question.

The Michigan Imputation Server did nothing wrong. You're the one who destroyed your dataset by converting to and from ped/map; this wrecks the distinction between a REF=A, ALT=ATCTC insertion and a REF=ATCTC, ALT=A deletion.

In order to fix this, you have to delete all uses of the ped/map format in all of your scripts, and then rerun everything. (You'll also need to add --keep-allele-order to any plink 1.9 runs that don't already have it.) You recklessly ignored plink 2.0's --file error message and/or the warning in the --pedmap documentation, and this is the consequence.