Question

Plink Cnv Map Names

1

Entering edit mode

12.8 years ago

bob.obr13n ▴ 60

I am using PENNCNV and PLINK to perform association studies on our data. I have used the standard pipeline to convert the PENNCNV calls to plink CNV files (penncnvtoplink.pl followed by plink --cnv-make-map). I have noticed that the marker names generated by PLINK can be ambiguous when there is two variants of a CNV starting at at the same base pair location. For example if the a re two CNVs C1 and C2 on Chromosome 1, C1 starts at BP1000 and ends at BP2000, C2 starts at BP1000 and ends at BP 2020. In this case there is obviously a conflict as PLINK generates the marker based on the BP of the CNV and they oith start at the same location. The markers generated for C1 and C2 will be p1-1000 and p1-1001 but there is no way to differentiate between them. Has anyone come across this and resolved it ?

My problem is that I am geting a significant association for both my markers and I need to sort out which one is which.

cnv association plink • 3.4k views

ADD COMMENT • link updated 12.8 years ago by Joachim ★ 2.9k • written 12.8 years ago by bob.obr13n ▴ 60

score 1 · Answer 1 · 2012-07-16

Could you be more specific where you see this problem? Right now, I think your are looking at the summary file, which is tailored for use with SNPs and does not address segments that you get with CNV data.

Example:

First, I create a CNV file plink.cnv with the following contents:

FID    IID   CHR       BP1       BP2  TYPE   SCORE  SITE
1        1     1      1000      2000     1       0     0
1        2     1      1000      2020     2       0     0
1        3     1      1000      2000     2       0     0
2        4     1      1500      2300     1       0     0
2        5     1      1200      2500     2       0     0

Second, I create a corresponding FAM file plink.fam that looks like:

1 1 0 0 1 1
1 2 0 0 1 2
1 3 0 0 1 1
2 4 0 0 1 2
2 5 0 0 1 1

Third, I create a MAP file using plink:

plink --noweb --cnv-list plink.cnv --cnv-make-map

Fourth, I create a segment list from the data, again using plink:

plink --noweb --allow-no-sex --cfile plink --cnv-seglist

Finally, the generated segment list, plink.cnv.seglist, contains the following information:

Chromosome 1

  p1-1000     +--  
  p1-1200     ||| -
  p1-1500     |||+|
  p1-2000     UU|||
  p1-2001       |||
  p1-2020       A||
  p1-2021        ||
  p1-2300        A|
  p1-2301         |
  p1-2500         U
  p1-2501

I admit that the plink generated markers p1-2001, p1-2021, etc. are a bit confusing, but all the information from your original data is preserved in the seglist-file.

There are many more examples on this page: http://pngu.mgh.harvard.edu/~purcell/plink/cnvfreqs.shtml

Hope this helps.