Inherently Explainable Hierarchical Generalized Learning Vector Quantization Models
Although deep learning models and transformers have demonstrated superior performance in identification and classification tasks across various domains, they often function as black-box models. Black-box AI models lack transparency, limiting their use in critical fields like healthcare and finance. Treebased structures offer clearer, step-by-step decision processes that are inherently explainable, but typically split on single features, which implies that very deep trees are needed for high-dimensional data with feature correlations, undermining their explainability. To address this issue, we propose Hierarchical Generalized Learning Vector Quantization (HGLVQ), a tree-based model that is suitable for high-dimensional and correlated data by performing splits based on which prototype is closest. Explanations of decisions, then, take the shape of a sequence of split decisions of the type: data point x was closest to prototype w, therefore the decision was made. Further, our model merges the flexibility of tree-based approaches with the strengths of prototype-based models, such as interpretability, margin maximization, and robustness. Combining prototype-based models with metric learning and dimensionality reduction techniques can further enhance model performance and facilitate visualizations. Initial experiments on MNIST (83% accuracy) and UCI Bank (97% accuracy) datasets demonstrate the feasibility of our method, though further research is needed to assess generalization to further domains.
To enrich explanations of HGLVQ classifications, we investigate different visualization techniques. For image data, we display prototypes as images; for tabular data we investigate dimensionality reduction to show 2D embeddings of the classification. Depending on the type of data (static context), different visualizations can be used to make sense of prototypes and, hence, understand classification decisions of the model. Importantly, because the resulting trees are fairly small (just depth 2 for MNIST, for example), users are empowered to inspect the global structure of the model, not only single, local decisions.
We argue that the combination of prototypes and tree-based decisions also opens up opportunities for skilled users to recognize model failures and undesired behaviors, namely if prototypes do not appear to represent a class well, or if trees get too deep to be interpretable. Once such effects are observed, hyperparameters can be adjusted to achieve a better model, yielding a co-constructive loop in which human decisions and machine learning are interleaved, supported by interpretability and explanations. Importantly, domain knowledge of the users plays a crucial role to make sense of and criticize the prototypes and tree structures, as well as define suitable visualizations for prototypes.
Despite the promising approach, ample future work is still needed. The most crucial issues are suitable distance metrics in high-dimensional spaces. In such spaces, the standard Euclidean distance oftentimes leads to unsatisfactory results (as evidenced by the relatively low accuracy on MNIST and below 20% accuracy on CIFAR-10). Instead, we need distances that bring points from the same class closer together, pull points from different classes farther apart, remain efficient in computation but don’t compromise on interpretability and explainability: Points that are close should also look similar to human eyes and vice versa. Otherwise, explanations may be misleading and adversarial examples may occur. Creating such metrics is a fundamental research problem in machine learning, but prototype-based models may be prime candidates to make progress in this direction.
As a first step in this direction, we used convolutional neural networks (CNNs) as feature extractors for prototype-based models and considered the distance in the embedding space instead of the pixel space. First experimental results indicate that the accuracy of a classic CNN classifier on MNIST can be matched (98%) while the accuracy on CIFAR-10 (84%) is not quite achieved (73% for the GLVQ variant) but can be matched if a classic CNN is trained first and then transferred to GLVQ. These results underscore the potential of integrating deep feature extraction with GLVQ models, particularly when interpretability is important. Yet, further work is necessary to avoid typical pitfalls of deep feature extraction, such as adversarial attacks.
Based on these results, we conclude that designing a universal model that performs well across all dataset types remains a significant challenge. Highdimensional datasets, such as CIFAR-10, expose the limitations of traditional and even hierarchical LVQ models. Therefore, further research is necessary to develop models that strike a better balance between performance and explainability.
Presentation Stability of Model Explanations in Interpretable Prototype-based Classification Learning held at the 3rd TRR 318 Conference: Contextualizing Explanations on 17th of June 2025 in Bielefeld, Germany