Assessing Intersectional Bias in Representations of Pre-Trained Image Recognition Models

Valerie Krug und Sebastian Stober

Deep Learning (DL) has proven to be a highly effective tool across numerous fields of application. However, biases in training data can reinforce stereotypes, leading to predictions that are unfair, particularly toward marginalized groups. These biases can impact not only the model’s outputs but also its internal representations. The development of DL models frequently relies on transfer learning, where pre-trained models are used as basis for an own learning task. With such approach, bias from the original models can be propagated. Identifying such biases is critical for understanding inequalities that can arise from transfer learning. Further, suitable techniques are necessary to present them to stakeholders in different contexts.

In previous research, we have investigated bias in representations of pretrained image classifiers using the FairFace data set for evaluation. Here, we expand on these experiments by investigating a broader range of Convolutional Neural Network (CNN) models and we account for intersectionality by considering the combination of sensitive attributes “race”, “age”, and “gender” as provided in the FairFace data set. We use our previously introduced representation analysis technique with visualization as topographic maps, and investigate linear separability of intersectional groups with linear classifier probes. The Code is publicly available and more detailed results are provided on arXiv .

Here, we present results for an exemplary ResNet50 layer for which a classifier probe showed the lowest error along the layers. First, we discuss frequent errors made by the probe. Second, we visualize the groups’ representations. None of the frequent errors by the linear classifier (Table 1) is based on a wrongly predicted gender. Wrong predictions mostly differ to the target in both race and age. Age errors are almost always wrong predictions as the adjacent age, like classifying “40–49” as either “30–39” or “50–59”. Such errors are understandable because faces do not strictly indicate an age, particularly at the age group boundaries. Most errors for the “race” variable are made for “Latino_Hispanic” groups which are wrongly predicted as “White”, “Indian” or “Middle Eastern”.

Table 1: Frequent training set errors of a classifier probe trained on layer 15 of ResNet50. Bold print indicates differences of annotation and predicted group.

Multiple small circular heatmaps showing data distribution by age groups and ethnicities for females and males, with a color scale from blue to red. — Fig. 1: Topographic activation maps for all subgroups in ResNet50 layer 15.

The observed age errors fit to the patterns in the activation visualization in Figure 1 because activations of groups of adjacent ages are similar. In addition, there appears to be a stronger difference between ages “50–59” to “60–69”. We do not find clear race differences, which supports the potential to confuse race categories. However, we do not find a clear reason why the classifier particularly confuses “Latino_Hispanic” groups as often. Finally, the lack of errors in the gender variable is reasonable because most groups can be easily distinguished between the “Female” and “Male” category, particularly for the middle ages.

In addition to the exemplary ResNet50 layer, we investigated representations in mutliple layers of pre-trained models VGG16, ResNet50 and InceptionV3. Our findings indicate that in ImageNet classifiers, regardless of the architecture, age difference is most pronounced compared to race and gender. A recent related clustering-based analysis supports this. Note that this is specific to facial images and their features. We suspect that the encoded age differences are related to skin texture changes, based on previous findings which suggest that models focus on structured patterns. With our visualization, we believe to provide more accessible explanations of DL model representations in non-expert contexts.

Presentation Assessing Intersectional Bias in Representations of Pre-Trained Image Recognition Models held at the 3rd TRR 318 Conference: Contextualizing Explanations on 18th of June 2025 in Bielefeld, Germany

Nächstes Kapitel

19 Contextualizing Explainability of Learning-Path Recommendations through Knowledge Graphs and Graph-based MDP

IntroductionHuman learning is a complex and multi-dimensional process, governed by a wide range of factors that describe the learner, the learning content, and the learning environment. Even in learning environments where learners are subject to comparable conditions, such as in classrooms, the individual differences between learners influence how they respond to the learning materials and activities. This influence becomes more dominant when learning takes place in less formal…

Schriftgröße

Klein

Mittel

Groß

Hintergrund

% Lesefortschritt

Inhaltsverzeichnis
Contextualizing Explanations
Fußnoten
1. Dosovitskiy, A. et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020);
  Kim, S. et al.: Squeezeformer: An efficient transformer for automatic speech recognition. In: Advances in Neural Information Processing Systems. vol. 35, pp. 9361-9373 (2022)
2. Bolukbasi, T. et al.: Man is to computer programmer as woman is to homemaker? Debiasing Word Embeddings. In: Advances in Neural Information Processing Systems. vol. 29, pp. 4349-4357 (2016)
3. Torrey, L., Shavlik, J.: Transfer learning. In: Handbook of research on machine learning applications and trends: algorithms, methods, and techniques, pp. 242-264. IGI global (2010) https://doi.org/10.4018/978-1-60566-766-9.ch011
4. Salman, H. et al.: When does bias transfer in transfer learning? arXiv preprint arXiv:2207.02842 (2022)
5. Krug, V., Olson, C., Stober, S.: Visualizing Bias in Activations of Deep Neural Networks as Topographic Maps. In: Proc. of the 1stWorkshop on Fairness and Bias in AI (AEQUITAS 2023) co-located with 26th European Conference on Artificial Intelligence (ECAI 2023) Kraków, Poland. CEUR-WS (2023);
  Krug, V.: Neuroscience-Inspired Analysis and Visualization of Deep Neural Networks. PhD thesis, Otto-von-Guericke-Universität Magdeburg (2024)
6. Karkkainen, K., Joo, J.: FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age for Bias Measurement and Mitigation. In: Proc. of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 1548-1558 (2021) https://doi.org/10.1109/WACV48630.2021.00159
7. Krug, V. et al.: Visualizing Deep Neural Networks with Topographic Activation Maps. In: HHAI 2023: Augmenting Human Intellect, pp. 138-152. IOS Press (2023) https://doi.org/10.3233/FAIA230080
8. Alain, G., Bengio, Y.: Understanding intermediate layers using linear classifier probes. In: Int. Conf. on Learning Representations (ICLR), Workshop Track Proceedings (2017)
9. https://github.com/valeriekrug/iNNtrospect/tree/intersectional-bias
10. https://arxiv.org/abs/2506.03664
11. He, K. et al.: Deep residual learning for image recognition. In: Proc. of the IEEE conf. on computer vision and pattern recognition. pp. 770-778 (2016) https://doi.org/10.1109/CVPR.2016.90
12. Krug, V., Röhrbein, F., Stober, S.: Intersectional Bias Quantification in Facial Image Processing with Pre-Trained ImageNet Classifiers. In: Int. Joint Conf. on Neural Networks (IJCNN). pp. 1-8 (2025) https://doi.org/10.1109/IJCNN64981.2025.11228524
13. Baker, N. et al.: Deep convolutional networks do not classify based on global object shape. PLoS computational biology 14(12), e1006613 (2018) https://doi.org/10.1371/journal.pcbi.1006613
Primärquellen
1. https://github.com/valeriekrug/iNNtrospect/tree/intersectional-bias
2. https://arxiv.org/abs/2506.03664
Literaturverzeichnis
1. Alain, G., Bengio, Y.: Understanding intermediate layers using linear classifier probes. In: Int. Conf. on Learning Representations (ICLR), Workshop Track Proceedings (2017).
2. Baker, N., Lu, H., Erlikhman, G., Kellman, P.J.: Deep convolutional networks do not classify based on global object shape. PLoS computational biology 14(12), e1006613 (2018). https://doi.org/10.1371/journal.pcbi.1006613
3. Bolukbasi, T., Chang, K.W., Zou, J.Y., Saligrama, V., Kalai, A.T.: Man is to computer programmer as woman is to homemaker? Debiasing Word Embeddings. In: Advances in Neural Information Processing Systems. vol. 29, pp. 4349-4357 (2016).
4. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
5. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. of the IEEE conf. on computer vision and pattern recognition. pp. 770-778 (2016). https://doi.org/10.1109/CVPR.2016.90
6. Karkkainen, K., Joo, J.: FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age for Bias Measurement and Mitigation. In: Proc. of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 1548-1558 (2021). https://doi.org/10.1109/WACV48630.2021.00159
7. Kim, S., Gholami, A., Shaw, A., Lee, N., Mangalam, K., Malik, J., Mahoney, M.W., Keutzer, K.: Squeezeformer: An efficient transformer for automatic speech recognition. In: Advances in Neural Information Processing Systems. vol. 35, pp. 9361-9373 (2022).
8. Krug, V.: Neuroscience-Inspired Analysis and Visualization of Deep Neural Networks. PhD thesis, Otto-von-Guericke-Universität Magdeburg (2024).
9. Krug, V., Olson, C., Stober, S.: Visualizing Bias in Activations of Deep Neural Networks as Topographic Maps. In: Proc. of the 1stWorkshop on Fairness and Bias in AI (AEQUITAS 2023) co-located with 26th European Conference on Artificial Intelligence (ECAI 2023) Kraków, Poland. CEUR-WS (2023).
10. Krug, V., Ratul, R.K., Olson, C., Stober, S.: Visualizing Deep Neural Networks with Topographic Activation Maps. In: HHAI 2023: Augmenting Human Intellect, pp. 138-152. IOS Press (2023). https://doi.org/10.3233/FAIA230080
11. Krug, V., Röhrbein, F., Stober, S.: Intersectional Bias Quantification in Facial Image Processing with Pre-Trained ImageNet Classifiers. In: Int. Joint Conf. on Neural Networks (IJCNN). pp. 1-8 (2025). https://doi.org/10.1109/IJCNN64981.2025.11228524
12. Salman, H., Jain, S., Ilyas, A., Engstrom, L., Wong, E., Madry, A.: When does bias transfer in transfer learning? arXiv preprint arXiv:2207.02842 (2022).
13. Torrey, L., Shavlik, J.: Transfer learning. In: Handbook of research on machine learning applications and trends: algorithms, methods, and techniques, pp. 242-264. IGI global (2010) https://doi.org/10.4018/978-1-60566-766-9.ch011

Bibliografische Daten

Erscheinungsdatum	5. März 2026
DOI	10.64136/baul3002
Creative Commons Lizenz

Assessing Intersectional Bias in Representations of Pre-Trained Image Recognition Models

Nächstes Kapitel

19 Contextualizing Explainability of Learning-Path Recommendations through Knowledge Graphs and Graph-based MDP