1) Knowledge-aware Inference for Speech Recognition in HRI
Previous works on knowledge-aware ASR inference have explored contextual and non-contextual (long-term) knowledge separately and have only experimented with English [1, 2]. The following can extend the previous works.
• Combining contextual and static knowledge for knowledge-aware inference of a pre-trained speech recognition model for HRI.
• A study to analyze effects of contextual speech recognition in HRI, possibly for a language other than English (e.g., Italian ASR).
[1] P. Pramanick and C. Sarkar, “Can visual context improve automatic speech recognition for an embodied agent?” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022, pp. 1946–1957.
[2] ——, “Utilizing prior knowledge to improve automatic speech recognition in human-robot interactive scenarios,” in Companion of the 2023 ACM/IEEE International Conference on Human-Robot Interaction, 2023, pp. 471–475.
2) Emotional Response as Implicit Feedback for Confidence Estimation of Classifiers
The confidence estimate of a classifier is generally based on an estimation of how much the input matches the training data distribution. However, the emotional response of a co-located human who observes a prediction by a robot, can be used to regularize the model’s confidence estimation. For example, a positive valance may indicate a correct prediction, even though the model is not confident.