We came across a project in our lab that no one exactly knows how to approach. Since I know a little bit of Python programming, this project was assigned to me.
There is a data from a randomised controlled clinical trial with 60 participants. Half (30) are control and the other half are actual patients (case). From each group, biopsy samples were taken before (pre) and after (post) treatment with drug A that targets cell type B in the tissue.
In each biopsy sample, the B cells were detected and classified either as “normal” or “malignant”.
Associated metadata for each biopsy are displayed here (I have only included 12 records for this file):
name patient_id arm treatment
111 0 control pre
112 1 control pre
113 2 control pre
121 0 control post
122 1 control post
123 2 control post
211 75 case pre
212 76 case pre
213 77 case pre
221 75 case post
222 76 case post
223 77 case post
name
: spreadsheet file namepatient_id
: patient identity numberarm
: trial arm (‘case’ or ‘control)treatment
: treatment condition (‘pre’ or ‘post’ treatment)
Other files (I have only included 20 records for each file: 10 normal and 10 malignant) contain cell detection results from a single biopsy, and each row in a spreadsheet represents an individual detected cell:
x
: x coordinatey
: y coordinatelabel
: classified label (‘normal’ or ‘malignant’)
Just showing an example, file 111 looks like this:
x y label
730 724 normal
1962 450 normal
1511 817 normal
1244 455 normal
2529 397 normal
1878 262 normal
2248 369 normal
2007 273 normal
1531 878 normal
1729 834 normal
931 1270 malignant
1282 314 malignant
1630 839 malignant
1543 460 malignant
2493 237 malignant
1311 744 malignant
1999 366 malignant
737 1361 malignant
2252 448 malignant
2620 398 malignant
The rest can be found here, but probably they will not be necessary: https://www.mediafire.com/file/ka7r59kf0swnbnd/OtherFiles.rar/file
I am trying to answer 3 questions here (if you can think of other questions, please let me know):
- identifying post-treatment morphological changes due to the effect of drug A.
- proposing measures to quantify the changes.
- Using appropriate statistical analysis tools to provide insight whether the changes are due to chance or not.
The linchpin is what features you are interested in; can you describe them in plain English? Something like "how many of the cells are malignant" or "how interspersed malignant cells are with normal cells" or "how diffuse malignant cells are from each other"?
Had not thought about these questions.
Are the x and y coordinates important for calculating i.e. some general area/density of cells or something like that? Otherwise you have few data columns: patient_id, arm, treatement, cells_norm, cells_malignant, cells_total
Assuming the control arm patients also have biopsies at two time points you can estimate variation due to a fact that biopsies are from different spots and sometime apart.