The Unreachable 90%
The story of an obsessive target on the PIMA diabetes dataset, a systematic investigation across 9 sprints, and the Python package that emerged from what remained.
A few months ago, I became fixated on a small target. I wanted to reach 90% accuracy on the PIMA diabetes dataset. It is a 768-row clinical dataset collected before 1990, known to contain label noise. The honest ceiling in the literature is around 85-89%. Still, I wanted to try. I thought, "Do what hasn't been done."
9 sprints, 40+ experiments, 14 new hybrid techniques. I named all of them. Dialectical ML, latent subtype routing, conformal selective classification, and others. I coded each one, ran the experiments, and recorded the results.
Then I placed a systematic test on top of everything. 50 different seeds, 4 different threshold strategies. 200 experiments. Not a single one came close to 90%. The upper bound was 0.83, and the average was 0.74. The math spoke clearly.
On the PIMA dataset, 90% was simply not reachable. The reason was not model weakness. Because of the data, label noise, and class overlap, the target was statistically unreachable.
At first, there was disappointment. I thought I had lost. Then I realized that proving something is unreachable is different from reaching it, but it is still an equivalent scientific result. Most of the "99% accuracy" claims in the literature come from data leakage or seed cherry-picking. I showed this systematically.
Then I looked at what I had left. 14 classifiers, an experiment tracking system, calibration tools, fairness auditing, and clinical reporting. I did not want to throw it away.
Proving that something cannot be done is still a result. Sometimes, something more lasting comes out of it.
I had two options. Either forget everything in a folder, or turn it into a package that someone else could use. I chose the second one.
That is how clinikit was born. A lightweight, sklearn-compatible toolkit for tabular machine learning.
- 14 hybrid classifiers with a single pip install
- 5 experiment protocols with automatic leakage and sampling checks
- Cleanlab, neighborhood conflict, and LOO analysis in one module
- Platt, isotonic, and temperature scaling calibration
- Subgroup fairness and documentation helpers
- Jinja-based structured HTML report generator
It is not a new invention. It is an integration of existing techniques, such as Mixture of Experts, conformal prediction, co-training, and label refinement, designed around clinical research workflows.
Reaching the target is not always the most valuable outcome. Sometimes, honestly showing that the target is unreachable and leaving the tools you built for someone else creates a more lasting mark.
Explore clinikit:
