I think it's the main HES data that you want with 410,309 participants, which I believe is 41270 Diagnoses - ICD10 (this should be a combination of both main diagnoses 41202 and any secondary diagnoses 41204).
You can see how the data is derived for 41270 in the Related Data-fields tab for 41270
I think that the ICD-9 encoded data (41271 approx. 20,000 participants) are from the earlier Scottish data? If you look at the Notes tab for this field in the data showcase it has the following "Please note ICD-9 coded hospital inpatient data are only available for older Scottish hospital records"
It does also note that ICD-9 codes are for 20,302 participants.
from HospitalEpisodeStatistics.pdf (available through the data showcase) p.4
"All of the current UK Biobank linked English and most Welsh hospital data are coded in ICD-10 and OPCS-4. However, because the collection of Scottish data began earlier (in 1981), the earlier Scottish data (those collected prior to 1997) are coded in ICD-9 and OPCS-3, and small number of Welsh records are coded with ICD-9".
So to answer the question 41271 is the combined (main and secondary) ICD-9 diagnoses from approx. 20,000 Scottish / Welsh participants.
This does cause problems if you don't exclude particpants with ICD-9 codes from the analysis, unless you could map any ICD-9 codes you require to ICD-10.
Hope that helps