Social, Mobile Audio Spaces
Home
Research:
  Conversational Engagement
  Simultaneous Conversations
  Push-to-Talk
People
Press
Publications

We are developing machine learning techniques to estimate the level of participants' engagement in an ongoing remote conversation. The goal is to build a communication system that adapts to the current conversational state. One use of these engagement estimates would be to increase or decrease the “richness” of a communication session in an automatic and seamless way. For example, if two users are speaking in a push-to-talk (half-duplex audio) session and become highly engaged, the system could automatically switch over to a telephony (duplex audio) connection. Similarly, if the two participants become even more engaged in the telephone conversation, the system could then add a video channel.

This idea - building a system that can seamlessly and automatically shift between different kinds of communication, based on engagement - arose from fieldwork. The fieldworker conceived the idea after observing college students using push-to-talk voice services of the kind that work like walkie-talkie radios but travel over the cellular telephone network. She saw a number of situations where it would have been helpful if a walkie-talkie conversation had simply turned into a telephone conversation and vice versa. It is not always straightforward for people to decide whether they should be having the really informal walkie-talkie kind of interaction or the more committed telephone kind of interaction. She thought it would be helpful if the system knew when conversational participants were becoming more engaged or less engaged, because it could help them switch to a more appropriate kind of communication channel.

Deciding how “engaged” people are from audio is technically challenging. When people listen to speech and make this decision, it is largely based on affect - expressions of emotion. There had been considerable research on the relationship between emotion and speech affect, but engagement isn't necessarily the same as emotion; someone can be highly engaged in sad conversations or angry conversations as well as happy ones. However, some affect states, such as those that psychologists call arousal or activation levels, can be directly estimated from acoustic features and also are inherently correlated with user engagement. We developed a multilevel structure to capture the joint behavior of participants in a conversation and model the influence of individual participants on each other. The two parts of the multilevel structure tie together the previous research on recognizing speech affect with the special characteristics of conversational engagement.

While there has been a great deal of research on speech recognition and emotion recognition, to our knowledge, this is the first work that attempts to determine user engagement in everyday conversation based on acoustic signals.

For more details, please see our ICSLP paper.

 

Modified: $Date: 2005/01/28 02:15:53 $ audiospace, audiospaces