I've been looking at UK Biobank data and it seems the data holds the segment mean l2r (or log base 2) values for the Copy Number Variation but doesn't actually have the segment start and end positions. Each file is for a particular chromosome and contains all 500,000 patients but I was wondering if anyone knows where we might find the actual location on the chromosome the values correspond to.
biobank
,cnv
seem to be tags relevant to your post, whereasgenome
doesn't really say much. Please add relevant tags so people following those tags would get to see your posts.Added the tags you mentioned
Well, in addition to Ram's comments, at which data are you looking, exactly? - you have provided no links. I can probably just contact the relevant person directly if you let me know from where you obtained your data.
Hey Kevin,
I can't exactly give you a link to the data itself. UK Biobank has a strict policy on how data is given out however this is the project website UK Biobank. A lot of the documentation seems to be talking about raw sequencing reads, however the inferred l2r CNV data is technically using these files to create the output files from my understanding.
This is the link to the actual instructions for data download Resource 664. The data we are using is the CNV log2r data however as you can read, the files downloaded are per chromosome. The issue is the files essentially only hold the log2r values but give no indication of which portion of the chromosome they are from, which is not very helpful.
Hope that clarifies things.
Thanks in advance!
Edit: I should also mention that segment means are the log2r values, I've been using them interchangeably.