Diabetes Risk Prediction Application - PIMA Dataset

Diabetes risk prediction on the PIMA dataset with an explainable AI approach. Model comparison, SHAP explainability and a decision-support prototype.

ROLEData Product Owner · ML Researcher

COMPANYAcademic

SHIPPED

VIEW ON GITHUB↗

1 · Data Entry

2 · Result & SHAP

Machine LearningHealthTechSHAPExplainable AITUBITAK

Overview

The PIMA project was framed as an explainable decision-support prototype that predicts type 2 diabetes risk on the PIMA Indians Diabetes dataset. The goal was not just to produce a high-accuracy model. It was to make the model's decisions explainable in a health context and visible to the user.

Problem

In healthcare, an ML model that only outputs a score does not build enough trust for a decision-maker. The model has to explain which variables drove the risk prediction, which factors affect the outcome, and how the result reads in a decision-support context.

System

I built the project around data processing, model comparison, explainability and the prototype interface. Data cleaning and preprocessing on the PIMA set, model comparison, performance metrics like Accuracy, F1 and ROC-AUC, SHAP-based explainability, and the risk prediction output were all in scope.

The aim was to present the result as an explainable risk evaluation, not a binary positive or negative call. For the TÜBİTAK 2209-A application I worked on the project summary, original value, aim, methodology, work packages, risk management and budget structure.

Outcome

The PIMA project landed as an explainable AI decision-support prototype that works on health data, not as a basic classification exercise. Explainability and a user-trust-friendly output structure were brought to the center, alongside model performance.

My role

Project idea, research scope, model comparison approach, SHAP explainability setup, decision-support prototype logic, TÜBİTAK application structure, methodology, work-package and risk-management documentation.