In this line of research, the gameplay footage (screen pixels, and optionally audio) is used as the sole input for predicting the arousal of the player. Deep convolutional neural networks use the frames of the screen as input, and treat the self-annotated arousal traces (provided by the player) as a classification or a preference learning task. The work is innovative as it does not include footage of the player's body or face for prediction, but assumes that the players' affective state is embedded in the gameplay context.
Konstantinos Makantasis, Antonios Liapis and Georgios N. Yannakakis: "The Pixels and Sounds of Emotion: General-Purpose Representations of Arousal in Games," in IEEE Transactions on Affective Computing 14(1), 2023. PDF BibTex
Konstantinos Makantasis, Antonios Liapis and Georgios N. Yannakakis: "From Pixels to Affect: A Study on Games and Player Experience," in Proceedings of the International Conference on Affective Computing and Intelligent Interaction, 2019. PDF BibTex