How to understand Instance and Array in UKB phenotype data
1
0
Entering edit mode
4.4 years ago
Shicheng Guo ★ 9.6k

Dear All,

I want to receive some help to understand Instance and array in the following context:

UKBB: The standard dataset (downloadable directly by researchers) contains a record of all the bulk data-files approved, however only the data-file IDs are present rather than the actual contents of the files themselves. These data-file IDs have the format "F_I_A" where F is the field ID, I is the instance index and A is the array index. Hence 8034_4_2 corresponds to Field 8034, Instance 4, Array 2.

Thanks.

UKB phenotype field instance array • 2.5k views
ADD COMMENT
2
Entering edit mode
4.4 years ago
Sam ★ 4.8k

Array is for fields that can have multiple entries e.g. ICD10. It is basically breaks down the array into multiple file. For example, if someone has 20 ICD code, then those 20 ICD code will be found in 10 different array files.

Instance is something related to the multiple measurement. Some of the samples were assessed multiple time. Instance represents when did the current phenotype were measured. For example, instance 0 is usually used as that is the baseline measurement and have most data point. Instance 1 is the first follow up and instance 2 is the second follow up etc. Hope this help

ADD COMMENT
0
Entering edit mode

Data-Field 5986 has 114 array. How to figure out how these 114 array are generated?

5,758,563 items of data are available, covering 95,140 participants. Defined-instances run from 0 to 1, labelled using Instancing 2. Array indices run from 0 to 113. Units of measurement are seconds

Thanks Sam!!

ADD REPLY
1
Entering edit mode

have never worked with this phenotype so I don't have any idea. Their document on this also isn't very clear. If I have to guess based on information from the note section, which stated: The Bike Test consists of many phases. A phase is generally divided into number of stages. At various points during the test, called trends, readings about heart rate, workload etc are recorded. This field contains the time spent within the phase of the trend entry.

I'd reckon each array represent the time spent on each stages of each phased.

ADD REPLY

Login before adding your answer.

Traffic: 1232 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6