Explainable Text Clustering in the Context of Psychological Research
In psychological research, free text responses are crucial to avoid biasing participants with a pre-defined set of answers. However, the analysis of free text responses is time-consuming, as it requires the development of a code-book, manual coding by multiple annotators, and multiple repeats to build consensus. In most research contexts, especially for larger numbers of participants, such qualitative research methods are infeasible. Hence, automatic methods to process free text responses would be desirable. Existing techniques mostly focus on detecting pre-defined keywords and coding responses accordingly, e.g. LIWC. More recently, transformer-based language models have been used to embed free text responses in a high-dimensional vector space and cluster them according to their pairwise cosine similarity. While such clustering methods have been quite successful, they are not sufficiently explainable to be usable for most psychological researchers, because extensive skills in Python programming, natural language processing, and machine learning would all be required to make sense of the clustering process and adjust the parameters in a responsible fashion. Therefore, we argue that a novel user interface is required which guides researchers step-by-step through the full process and makes all crucial methodological choices explicit to enhance researchers’ autonomy and encourage responsible use—in other words, an explainable interface for free text response clustering. We hope to provide such a tool with SCORES (Semantic Clustering of Open Responses via Embedding Similarity).
SCORES performs the following steps: 1) Researchers provide the free text responses in form of a CSV table; 2) the responses are embedded using a small language model of choice (per default: intfloat/multilingual-e5-large-instruct which is the best-performing model below 1b parameters on the MTEB benchmark at time of this article); 3) the cosine similarities to nearest neighbors are computed and outliers are removed accordingly; 4) a K-Means clustering is performed, where K can be chosen either by the researcher or automatically via quality metrics; 5) very close clusters are merged via agglomerative clustering according to a researcher-set distance threshold; 6) the clustering results are displayed to the researcher for closer inspection to ensure that the clusters make sense.
Importantly, every algorithm and model in this process operates on the same cosine similarities between the same embeddings, thus enhancing explainability. If researchers detect problems, e.g. that clusters contain responses that are semantically too different, or if multiple clusters on the same topic exist, researchers are encouraged to re-run the process with different settings, resulting in a co-constructive cycle: Researchers explore the data via the first clustering run, make sense of the clusters, then re-run the process to refine the clustering result, etc. Over time, we hope to raise not only the quality of the clustering at hand but also researchers’ understanding of the underlying clustering algorithms, thereby empowering researchers, rather than de-skilling them. First feedback from researchers is encouraging, but further research is needed to validate that psychological researchers do indeed understand the tool, can use it responsibly, and are both subjectively and objectively empowered.
Presentation Explainable Text Clustering in the Context of Psychological Research held at the 3rd TRR 318 Conference: Contextualizing Explanations on 18th of June 2025 in Bielefeld, Germany