Dienst van SURF
© 2025 SURF
The main goal of this study was to investigate if a computational analyses of text data from the National Student Survey (NSS) can add value to the existing, manual analysis. The results showed the computational analysis of the texts from the open questions of the NSS contain information which enriches the results of standard quantitative analysis of the NSS.
The goal of this study was therefore to test the idea that computationally analysing the Fontys National Student Surveys (NSS) open answers using a selection of standard text mining methods (Manning & Schütze 1999) will increase the value of these answers for educational quality assurance. It is expected that human effort and time of analysis will decrease significally. The text data (in Dutch) of several years of Fontys National Student Surveys (2013-2018) was provided to Fontys students of the minor Applied Data Science. The results of the analysis were to include topic and sentiment modelling across multiple years of survey data. Comparing multiple years was necessary to capture and visualize any trends that a human investigator may have missed while analysing the data by hand. During data cleaning all stop words and punctuation were removed, all text was brought to a lower case, names and inappropriate language – such as swear words – were deleted. About 80% of 24.000 records were manually labelled with sentiment; reminder was used for algorithms’ validation. In the following step a machine learning analysis steps: training, testing, outcomes analysis and visualisation, for a better text comprehension, were executed. Students aimed to improve classification accuracy by applying multiple sentiment analysis algorithms and topics modelling methods. The models were chosen arbitrarily, with a preference for a low complexity of a model. For reproducibility of our study open source tooling was used. One of these tools was based on Latent Dirichlet allocation (LDA). LDA is a generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar (Blei, Ng & Jordan, 2003). For topic modelling the Gensim (Řehůřek, 2011) method was used. Gensim is an open-source vector space modelling and topic modelling toolkit implemented in Python. In addition, we recognized the absence of pretrained models for Dutch language. To complete our prototype a simple user interface was created in Python. This final step integrated our automated text analysis with visualisations of sentiments and topics. Remarkably, all extracted topics are related to themes defined by the NSS. This indicates that in general students’ answers are related to topics of interest for educational institutions. The extracted list of the words related to the topic is also relevant to this topic. Despite the fact that most of the results require further human expert interpretation, it is indicative to conclude that the computational analysis of the texts from the open questions of the NSS contain information which enriches the results of standard quantitative analysis of the NSS.
Recent advancements in mobile sensing and wearable technologies create new opportunities to improve our understanding of how people experience their environment. This understanding can inform urban design decisions. Currently, an important urban design issue is the adaptation of infrastructure to increasing cycle and e-bike use. Using data collected from 12 cyclists on a cycle highway between two municipalities in The Netherlands, we coupled location and wearable emotion data at a high spatiotemporal resolution to model and examine relationships between cyclists' emotional arousal (operationalized as skin conductance responses) and visual stimuli from the environment (operationalized as extent of visible land cover type). We specifically took a within-participants multilevel modeling approach to determine relationships between different types of viewable land cover area and emotional arousal, while controlling for speed, direction, distance to roads, and directional change. Surprisingly, our model suggests ride segments with views of larger natural, recreational, agricultural, and forested areas were more emotionally arousing for participants. Conversely, segments with views of larger developed areas were less arousing. The presented methodological framework, spatial-emotional analyses, and findings from multilevel modeling provide new opportunities for spatial, data-driven approaches to portable sensing and urban planning research. Furthermore, our findings have implications for design of infrastructure to optimize cycling experiences.
MULTIFILE