Fußnote
Referenz
Valerie Krug, Sebastian Stober
Contextualizing Explanations

Assessing Intersectional Bias in Representations of Pre-Trained Image Recognition Models

Deep Learning (DL) has proven to be a highly effective tool across numerous fields of application.  However, biases in training data can reinforce stereotypes, leading to predictions that are unfair, particularly toward marginalized groups.  These biases can impact not only the model’s outputs but also its internal representations. The development of DL models frequently relies on transfer learning,  where pre-trained models are used as basis for an own learning task. With such approach, bias from the original models can be propagated.  Identifying such biases is critical for understanding inequalities that can arise from transfer learning. Further, suitable techniques are necessary to present them to stakeholders in different contexts.

In previous research,  we have investigated bias in representations of pretrained image classifiers using the FairFace data set  for evaluation. Here, we expand on these experiments by investigating a broader range of Convolutional Neural Network (CNN) models and we account for intersectionality by considering the combination of sensitive attributes “race”, “age”, and “gender” as provided in the FairFace data set. We use our previously introduced representation analysis technique with visualization as topographic maps,  and investigate linear separability of intersectional groups with linear classifier probes.  The Code is publicly available  and more detailed results are provided on arXiv 

Here, we present results for an exemplary ResNet50  layer for which a classifier probe showed the lowest error along the layers. First, we discuss frequent errors made by the probe. Second, we visualize the groups’ representations. None of the frequent errors by the linear classifier (Table 1) is based on a wrongly predicted gender. Wrong predictions mostly differ to the target in both race and age. Age errors are almost always wrong predictions as the adjacent age, like classifying “40–49” as either “30–39” or “50–59”. Such errors are understandable because faces do not strictly indicate an age, particularly at the age group boundaries. Most errors for the “race” variable are made for “Latino_Hispanic” groups which are wrongly predicted as “White”, “Indian” or “Middle Eastern”.

Table 1: Frequent training set errors of a classifier probe trained on layer 15 of ResNet50. Bold print indicates differences of annotation and predicted group.

Multiple small circular heatmaps showing data distribution by age groups and ethnicities for females and males, with a color scale from blue to red.

Fig. 1: Topographic activation maps for all subgroups in ResNet50 layer 15.

The observed age errors fit to the patterns in the activation visualization in Figure 1 because activations of groups of adjacent ages are similar. In addition, there appears to be a stronger difference between ages “50–59” to “60–69”. We do not find clear race differences, which supports the potential to confuse race categories. However, we do not find a clear reason why the classifier particularly confuses “Latino_Hispanic” groups as often. Finally, the lack of errors in the gender variable is reasonable because most groups can be easily distinguished between the “Female” and “Male” category, particularly for the middle ages. 

In addition to the exemplary ResNet50 layer, we investigated representations in mutliple layers of pre-trained models VGG16, ResNet50 and InceptionV3. Our findings indicate that in ImageNet classifiers, regardless of the architecture, age difference is most pronounced compared to race and gender. A recent related clustering-based analysis supports this.  Note that this is specific to facial images and their features. We suspect that the encoded age differences are related to skin texture changes, based on previous findings which suggest that models focus on structured patterns.  With our visualization, we believe to provide more accessible explanations of DL model representations in non-expert contexts.

Presentation Assessing Intersectional Bias in Representations of Pre-Trained Image Recognition Models held at the 3rd TRR 318 Con­fe­rence: Con­tex­tu­a­li­zing Ex­pla­na­ti­ons on 18th of June 2025 in Bielefeld, Germany

Nächstes Kapitel