Interactive Systems Publications

Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue.

  • Haizhou Li, Gina-Anne Levow, Zhou Yu, Chitralekha Gupta, Berrak Sisman, Siqi Cai, David Vandyke, Nina Dethlefs, Yan Wu, Junyi Jessy
  • PDF
  • 2021 SIGDIAL.

Domain Transfer for Deep Natural Language Generation from Abstract Meaning Representations.

    Hide/Show Full Abstract Stochastic natural language generation systems that are trained from labelled datasets are often domain-specific in their annotation and in their mapping from semantic input representations to lexical-syntactic outputs. As a result, learnt models fail to generalize across domains, heavily restricting their usability beyond single applications. In this article, we focus on the problem of domain adaptation for natural language generation. We show how linguistic knowledge from a source domain, for which labelled data is available, can be adapted to a target domain by reusing training data across domains. As a key to this, we propose to employ abstract meaning representations as a common semantic representation across domains. We model natural language generation as a long short- term memory recurrent neural network encoder-decoder, in which one recurrent neural network learns a latent representation of a semantic input, and a second recurrent neural network learns to decode it to a sequence of words. We show that the learnt representations can be transferred across domains and can be leveraged effectively to improve training on new unseen domains. Experiments in three different domains and with six datasets demonstrate that the lexical-syntactic constructions learnt in one domain can be transferred to new domains and achieve up to 75-100% of the performance of in-domain training. This is based on objective metrics such as BLEU and semantic error rate and a subjective human rating study. Training a policy from prior knowledge from a different domain is consistently better than pure in-domain training by up to 10%.
  • 2017. IEEE Computational Intelligence Magazine: Special Issue on Natural Language Generation with Computational Intelligence.

Natural language-based presentation of cognitive stimulation to people with dementia in assistive technology: a pilot study.

  • Dethlefs, N.
  • Milders, M.
  • Cuayáhuitl, H.
  • Al-Salkini, T.
  • Douglas, D.
  • PDF
    Hide/Show Full Abstract Currently, an estimated 36 million people worldwide are affected by Alzheimer’s disease or related dementias. In the absence of a cure, non-pharmacological interventions, such as cognitive stimulation, which slow down the rate of deterioration can benefit people with dementia and their caregivers. Such interven- tions have shown to improve well-being and slow down the rate of cognitive decline. It has further been shown that cognitive stimulation in interaction with a computer is as effective as with a human. However, the need to operate a computer often repre- sents a difficulty for the elderly and stands in the way of widespread adoption. A possible solution to this obstacle is to provide a spoken natural language interface that allows people with dementia to interact with the cognitive stimulation software in the same way as they would interact with a human caregiver. This makes the assistive technology accessible to users regardless of their technical skills and provides a fully intuitive user experience. This article describes a pilot study that evaluated the feasibility of computer-based cognitive stimulation through a spoken natural language interface. A prototype software was evaluated with 23 users, including healthy elderly people and people with dementia. Feedback was overwhelmingly positive.
  • 2017. Informatics for Health and Social Care.

Extrinsic vs Intrinsic Evaluation of Natural Language Generation for Spoken Dialogue Systems and Social Robotics.

  • Hastie, H.
  • Cuayáhuitl, H.
  • Dethlefs, N.
  • Keizer, S.
  • Liu, X.
  • Link to book
    Hide/Show Full Abstract [Book abstract] In the past 10 years, very few published studies include some kind of extrinsic evaluation of an NLG component in an end-to-end-system, be it for phone or mobile-based dialogues or social robotic interaction. This may be attributed to the fact that these types of evaluations are very costly to set-up and run for a single component. The question therefore arises whether there is anything to be gained over and above intrinsic quality measures obtained in off-line experiments? In this article, we describe a case study of evaluating two variants of an NLG surface realiser and show that there are significant differences in both extrinsic measures and intrinsic measures. These differences can be used to inform further iterations of component and system development.
  • 2016. In Jokinen, Kristiina and Wilcock, Graham (eds.) Dialogues with Social Robots – Enablements, Analyses, and Evaluation. Berlin: Springer Lecture Notes in Electrical Engineering (LNEE). ISBN 978-981-10-2584-6.

Information Density and Overlaps in Spoken Dialogue.

  • Dethlefs, N.
  • Hastie, H.
  • Cuayáhuitl, H.
  • Yu, Y.
  • Rieser, V.
  • Lemon, O.
  • PDF
    Hide/Show Full Abstract Incremental dialogue systems are often perceived as more responsive and natural because they are able to address phenomena of turn-taking and overlapping speech, such as backchannels or barge-ins. Previous work in this area has often identified distinctive prosodic features, or features relating to syntactic or semantic completeness, as marking appropriate places of turn-taking. In a separate strand of work, psycholinguistic studies have established a connection between information density and prominence in language—the less expected a linguistic unit is in a particular context, the more likely it is to be linguistically marked. This has been observed across linguistic levels, including the prosodic, which plays an important role in predicting overlapping speech. In this article, we explore the hypothesis that information density (ID) also plays a role in turn-taking. Specifically, we aim to show that humans are sensitive to the peaks and troughs of information density in speech, and that over-lapping speech at ID troughs is perceived as more acceptable than overlaps at ID peaks. To test our hypothesis, we collect human ratings for three models of generating overlapping speech based on features of: (1) prosody and semantic or syntactic completeness, (2) information density, and (3) both types of information. Results show that over 50% of users preferred the version using both types of features, followed by a preference for information density features alone. This indicates a clear human sensitivity to the effects of information density in spoken language and provides a strong motivation to adopt this metric for the design, development and evaluation of turn-taking modules in spoken and incremental dialogue systems.
  • 2016. Computer Speech and Language 37, pp. 82–97.

Why bother? Is evaluation of NLG in an end-to-end Spoken Dialogue System worth it?

  • Hastie, H.
  • Dethlefs, N.
  • Cuayáhuitl, H.
  • Keizer, S.
  • Liu, X.
  • PDF
    Hide/Show Full Abstract In the past 10 years, only around 15% of published conference papers include some kind of extrinsic evaluation of an NLG component in an end-to-end system. These types of evaluations are costly to set-up and run, so is it worth it? Is there anything to be gained over and above intrinsic quality measures obtained in off-line experiments? In this paper, we describe a case study of evaluating two variants of an NLG surface realiser and show that there are significant differences in both extrinsic measures and intrinsic measures. These significant differences would need to be factored into future iterations of the component and therefore, we con- clude that extrinsic evaluations are worthwhile.
  • 2016. In Proceedings of the International Workshop on Spoken Dialogue Systems (IWSDS). Ivalo, Finland.

Hierarchical Reinforcement Learning for Situated Language Generation.

    Hide/Show Full Abstract Natural Language Generation systems in interactive settings often face a multitude of choices, given that the communicative effect of each utterance they generate depends crucially on the interplay between its physical circumstances, addressee and interaction history. This is particularly true in interactive and situated settings. In this paper we present a novel approach for situated Natural Language Generation in dialogue that is based on hierarchical reinforcement learning and learns the best utterance for a context by optimisation through trial and error. The model is trained from human–human corpus data and learns particularly to balance the trade-off between efficiency and detail in giving instructions: the user needs to be given sufficient information to execute their task, but without exceeding their cognitive load. We present results from simulation and a task-based human evaluation study comparing two different versions of hierarchical reinforcement learning: One operates using a hierarchy of policies with a large state space and local knowledge, and the other additionally shares knowledge across generation subtasks to enhance performance. Results show that sharing knowledge across subtasks achieves better performance than learning in isolation, leading to smoother and more successful interactions that are better perceived by human users.
  • 2015. Natural Language Engineering 21, pp 391–435. Cambridge University Press.

Proceedings of the 4th International Workshop on Machine Learning for Interactive Systems. Co-located with the International Conference on Machine Learning (ICML), Lille, France.

  • Cuayáhuitl, H.
  • Dethlefs, N.
  • Frommberger, L.
  • van Otterlo, M.
  • Pietquin, O.
  • Link to proceedings
    Hide/Show Full Abstract Learning systems or robots that interact with their environment by perceiving, acting or communicating often face a challenge in how to bring these different concepts together. This challenge arises because core concepts are typically studied within their respective communities, such as the computer vision, robotics and natural language processing communities, among others. A commonality across communities is the use of machine learning techniques and algorithms. In this way, machine learning is crucial in the development of truly intelligent systems, not just by providing techniques and algorithms, but also by acting as a unifying factor across communities, encouraging communication, discussion and exchange of ideas. [...]
  • 2015. Proceedings in Journal of Machine Learning Research (JMLR): Workshop and Conference Proceedings.

Cluster-Based Prediction of User Ratings for Stylistic Surface Realisation.

  • Dethlefs, N.
  • Cuayáhuitl, H.
  • Hastie, H.
  • Rieser, V.
  • Lemon, O.
  • PDF
    Hide/Show Full Abstract Surface realisations typically depend on their target style and audience. A challenge in estimating a stylistic realiser from data is that humans vary significantly in their subjective perceptions of linguistic forms and styles, leading to almost no correlation between ratings of the same utterance. We address this problem in two steps. First, we estimate a mapping function between the linguistic features of a corpus of utterances and their human style ratings. Users are partitioned into clusters based on the similarity of their ratings, so that ratings for new utterances can be estimated, even for new, unknown users. In a second step, the estimated model is used to re-rank the outputs of a number of surface realisers to produce stylistically adaptive output. Results confirm that the generated styles are recognisable to human judges and that predictive models based on clusters of users lead to better rating predictions than models based on an average population of users.
  • 2014. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL). Gothenburg, Sweden.

A Semi-Supervised Clustering Approach for Semantic Slot Labelling.

    Hide/Show Full Abstract Work on training semantic slot labellers for use in Natural Language Processing applications has typically either relied on large amounts of labelled input data, or has assumed entirely unlabelled inputs. The former technique tends to be costly to apply, while the latter is often not as accurate as its supervised counterpart. Here, we present a semi-supervised learning approach that automatically labels the semantic slots in a set of training data and aims to strike a balance between the dependence on labelled data and prediction accuracy. The essence of our algorithm is to cluster clauses based on a similarity function that combines lexical and semantic information. We present experiments that compare different similarity functions for both our semi-supervised setting and a fully unsupervised baseline. While semi-supervised learning expectedly outperforms unsupervised learning, our results show that (1) this effect can be observed based on very few training data instances and that increasing the size of the training data does not lead to better performance, and (2) that lexical and semantic information contribute differently in different domains so that clustering based on both types of information offers the best generalisation.
  • 2014. In Proceedings of the International Conference on Machine Learning and Applications (ICMLA). Detroit, USA.

Training a Statistical Surface Realiser from Automatic Slot Labelling.

    Hide/Show Full Abstract Training a statistical surface realiser typically relies on labelled training data or parallel data sets, such as corpora of paraphrases. The procedure for obtaining such data for new domains is not only time-consuming, but it also restricts the incorporation of new semantic slots during an interaction, i.e. using an online learning scenario for automatically extended domains. Here, we present an alternative approach to statistical surface realisation from unlabelled data through automatic semantic slot labelling. The essence of our algorithm is to cluster clauses based on a similarity function that combines lexical and semantic information. Annotations need to be reliable enough to be utilised within a spoken dialogue system. We compare different similarity functions and evaluate our surface realiser—trained from unlabelled data—in a human rating study. Results confirm that a surface realiser trained from automatic slot labels can lead to outputs of comparable quality to outputs trained from human-labelled inputs.
  • 2014. In Proceedings of the IEEE Workshop on Spoken Language Technology (SLT). South Lake Tahoe, USA.

The PARLANCE Mobile App for Interactive Search in English and Mandarin.

  • Hastie, H.
  • Aufaure, M.
  • Alexopoulos, P.
  • Bouchard, H.
  • Cuayáhuitl, H.
  • Dethlefs, N.
  • Gasic, M.
  • Henderson, J.
  • Lemon, O.
  • Liu, X.
  • Mika, P.
  • Ben Mustapha, N.
  • Potter, T.
  • Rieser, V.
  • Thomson, B.
  • Tsiakoulis, P.
  • Vanrompay, Y.
  • Villa-Terrazas, B.
  • Yazdani, M.
  • Young, S.
  • Yu, Y.
  • PDF
    Hide/Show Full Abstract We demonstrate a mobile application in English and Mandarin to test and evaluate components of the Parlance dialogue system for interactive search under real-world conditions.
  • 2014. In Proceedings of the Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGdial).

Non-Strict Hierarchical Reinforcement Learning for Interactive Systems and Robots.

    Hide/Show Full Abstract Conversational systems and robots that use reinforcement learning for policy optimization in large domains often face the problem of limited scalability. This problem has been addressed either by using function approximation techniques that estimate the approximate true value function of a policy or by using a hierarchical decomposition of a learning task into subtasks. We present a novel approach for dialogue policy optimization that combines the benefits of both hierarchical control and function approximation and that allows flexible transitions between dialogue subtasks to give human users more control over the dialogue. To this end, each reinforcement learning agent in the hierarchy is extended with a subtask transition function and a dynamic state space to allow flexible switching between subdialogues. In addition, the subtask policies are represented with linear function approximation in order to generalize the decision making to situations unseen in training. Our proposed approach is evaluated in an interactive conversational robot that learns to play quiz games. Experimental results, using simulation and real users, provide evidence that our proposed approach can lead to more flexible (natural) interactions than strict hierarchical control and that it is preferred by human users.
  • 2014. ACM Transactions on Interactive Intelligent Systems. Vol. 4, No. 4.

Introduction to the Special Issue on Machine Learning for Multiple Modalities in Interactive Systems and Robots.

  • Cuayáhuitl, H.
  • Frommberger, L.
  • Dethlefs, N.
  • Raux, A.
  • Marge, M.
  • Zender, H.
  • Link to article
    Hide/Show Full Abstract This special issue highlights research articles that apply machine learning to robots and other systems that interact with users through more than one modality, such as speech, gestures, and vision. For example, a robot may coordinate its speech with its actions, taking into account (audio-)visual feedback during their execution. Machine learning provides interactive systems with opportunities to improve performance not only of individual components but also of the system as a whole. However, machine learning methods that encompass multiple modalities of an interactive system are still relatively hard to find. The articles in this special issue represent examples that contribute to filling this gap.
  • 2014. ACM Transactions on Interactive Intelligent Systems (ACM-TiiS).

Proceedings of the Second Workshop on Machine Learning for Interactive Systems (MLIS-2014): Bridging the Gap Between Perception, Action and Communication.

  • Cuayáhuitl, H.
  • Frommberger, L.
  • Dethlefs, N.
  • van Otterlo, M.
  • PDF
    Hide/Show Full Abstract The AAAI-14 Workshop program was held Sunday and Monday, July 27– 28, 2014, at the Québec City Conven- tion Centre in Québec, Canada. The AAAI-14 workshop program included 15 workshops covering a wide range of topics in artificial intelligence. The titles of the workshops were Artificial Intelli- gence and Robotics; Artificial Intelli- gence Applied to Assistive Technologies and Smart Environments; Cognitive Computing for Augmented Human Intelligence; Computer Poker and Imperfect Information; Discovery Infor- matics; Incentives and Trust in Elec- tronic Communities; Intelligent Cine- matography and Editing; Machine Learning for Interactive Systems: Bridg- ing the Gap Between Perception, Action, and Communication; Modern Artificial Intelligence for Health Analytics; Mul- tiagent Interaction Without Prior Coor- dination; Multidisciplinary Workshop on Advances in Preference Handling; Semantic Cities — Beyond Open Data to Models, Standards, and Reasoning; Sequential Decision Making with Big Data; Statistical Relational AI; and the World Wide Web and Public Health Intelligence. This article presents short summaries of those events.
  • 2014. Co-located with the 28th Conference on Artificial Intelligence (AAAI), Quebec City, Canada.

A Joint Learning Approach for Situated Language Generation.

    Hide/Show Full Abstract [Book abstract] An informative and comprehensive overview of the state-of-the-art in natural language generation (NLG) for interactive systems, this guide serves to introduce graduate students and new researchers to the field of natural language processing and artificial intelligence, while inspiring them with ideas for future research. Detailing the techniques and challenges of NLG for interactive applications, it focuses on the research into systems that model collaborativity and uncertainty, are capable of being scaled incrementally, and can engage with the user effectively. A range of real-world case studies is also included. The book and the accompanying website feature a comprehensive bibliography, and refer the reader to corpora, data, software and other resources for pursuing research on natural language generation and interactive systems, including dialog systems, multimodal interfaces and assistive technologies. It is an ideal resource for students and researchers in computational linguistics, natural language processing and related fields.
  • 2014. In Amanda Stent and Srinivas Bangalore (eds.) Natural Language Generation in Interactive Systems. Cambridge University Press.

Getting to Know Users: Accounting for the Variability in User Ratings.

  • Dethlefs, N.
  • Cuayáhuitl, H.
  • Hastie, H.
  • Rieser, V.
  • Lemon, O.
  • PDF
    Hide/Show Full Abstract Evaluations of dialogue systems and language generators often rely on subjective user ratings to assess output quality and performance. Humans however vary in their preferences so that estimating an accurate prediction model is difficult. Using a method that clusters utterances based on their linguistic features and ratings (Dethlefs et al., 2014), we discuss the possibility of obtaining user feedback implicitly during an interaction. This approach promises better predictions of user preferences through continuous re-estimation.
  • 2014. Poster paper in the Workshop on the Semantics and Pragmatics of Dialogue (SemDial). Edinburgh, Scotland.

Two Alternative Frameworks for Deploying Spoken Dialogue Systems to Mobile Platforms for Evaluation “in the Wild”.

  • Hastie, H.
  • Aufaure, M.
  • Alexopoulos, P.
  • Bouchard, H.
  • Cuayáhuitl, H.
  • Dethlefs, N.
  • Gasic, M.
  • Henderson, J.
  • Lemon, O.
  • Liu, X.
  • Mika, P.
  • Potter, T.
  • Rieser, V.
  • Tsiakoulis, P.
  • Vanrompay, Y.
  • Villa-Terrazas, B.
  • Yazdani, M.
  • Young, S.
  • Yu, Y.
  • PDF
    Hide/Show Full Abstract We demonstrate two alternative frameworks for testing and evaluating spoken dialogue systems on mobile devices for use “in the wild”. We firstly present a spoken dialogue system that uses third party ASR (Automatic Speech Recognition) and TTS (Text-To-Speech) components and then present an alternative using audio compression to allow for entire systems with home-grown ASR/TTS to be plugged in directly. Some advantages and drawbacks of both are discussed.
  • 2014. Poster paper in the Workshop on the Semantics and Pragmatics of Dialogue (SemDial). Edinburgh, Scotland.

Conditional Random Fields for Responsive Surface Realisation Using Global Features.

    Hide/Show Full Abstract Surface realisers in spoken dialogue systems need to be more responsive than conventional surface realisers. They need to be sensitive to the utterance context as well as robust to partial or changing generator inputs. We formulate surface realisation as a sequence labelling task and combine the use of conditional random fields (CRFs) with semantic trees. Due to their extended notion of context, CRFs are able to take the global utterance context into account and are less constrained by local features than other realisers. This leads to more natural and less repetitive surface realisation. It also allows generation from partial and modified inputs and is therefore applicable to incremental surface realisation. Results from a human rating study confirm that users are sensitive to this extended notion of context and assign ratings that are significantly higher (up to 14%) than those for taking only local context into account.
  • 2013. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL). Sofia, Bulgaria.

Hierarchical Joint Learning for Natural Language Generation.

    Hide/Show Full Abstract Natural Language Generation (NLG) systems typically face an uncertainty regarding the best utterance to communicate to a user in a given context given that the effect of a single utterance depends crucially on the interplay between its physical environment, pragmatic circumstances, addressee and interaction history. NLG system designers have traditionally used a pipeline architecture to divide the generation process into the distinct stages of content selection, utterance planning and surface realisation to choose the semantics, organisation and realisation of an utterance. Unfortunately, this sequential model does not account for the interdependencies that exist among these stages, which in practice has been manifest in inefficient instruction giving and an increased cognitive load for the user. This thesis will advocate a joint optimisation framework for situated NLG that is based on Hierarchical Reinforcement Learning combined with graphical models and will learn the best utterance for a given context by optimising its behaviour through a trial and error search. The joint model considers decisions at different NLG stages in interdependence with each other and thereby produces more context-sensitive utterances than is possible when considering decisions in isolation. To enhance the human-likeness of the model, two augmentations will be made. We will introduce the notion of a Hierarchical Information State to support the systematic pre-specification of prior knowledge and human preferences for content selection. Graphical models—Hidden Markov Models and Bayesian Networks—will then be integrated as generation spaces to encourage natural surface realisation by balancing the proportion of alignment and variation. Results from a human evaluation study show that the hierarchical learning agent learns a robust generation policy that adapts to new circumstances and users flexibly leading to smooth and successful interactions. In terms of the comparison between a joint and an isolated optimisation, results indicate that a jointly optimised system achieves higher user satisfaction and task success and is better perceived by human users than its isolated counterpart. To demonstrate the domain-independence and generalisabilty of the hierarchical joint optimisation framework, an additional study will be presented that transfers the model to a new, but related, domain: the generation of route instructions in a real navigation scenario using a situated dialogue system for indoor navigation. Results confirm that the NLG policy can be applied to new domains with limited effort and contribute to high task success and user satisfaction.
  • 2013. IOS Press / AKA Publishing. In Series Dissertations on Artificial Intelligence, Volume 340. ISBN 978-1-61499-115-1. Amsterdam / Berlin.

Hierarchical Joint Learning for Natural Language Generation.

    Hide/Show Full Abstract Natural Language Generation (NLG) systems typically face an uncertainty regarding the best utterance to communicate to a user in a given context given that the effect of a single utterance depends crucially on the interplay between its physical environment, pragmatic circumstances, addressee and interaction history. NLG system designers have traditionally used a pipeline architecture to divide the generation process into the distinct stages of content selection, utterance planning and surface realisation to choose the semantics, organisation and realisation of an utterance. Unfortunately, this sequential model does not account for the interdependencies that exist among these stages, which in practice has been manifest in inefficient instruction giving and an increased cognitive load for the user. This thesis will advocate a joint optimisation framework for situated NLG that is based on Hierarchical Reinforcement Learning combined with graphical models and will learn the best utterance for a given context by optimising its behaviour through a trial and error search. The joint model considers decisions at different NLG stages in interdependence with each other and thereby produces more context-sensitive utterances than is possible when considering decisions in isolation. To enhance the human-likeness of the model, two augmentations will be made. We will introduce the notion of a Hierarchical Information State to support the systematic pre-specification of prior knowledge and human preferences for content selection. Graphical models—Hidden Markov Models and Bayesian Networks—will then be integrated as generation spaces to encourage natural surface realisation by balancing the proportion of alignment and variation. Results from a human evaluation study show that the hierarchical learning agent learns a robust generation policy that adapts to new circumstances and users flexibly leading to smooth and successful interactions. In terms of the comparison between a joint and an isolated optimisation, results indicate that a jointly optimised system achieves higher user satisfaction and task success and is better perceived by human users than its isolated counterpart. To demonstrate the domain-independence and generalisabilty of the hierarchical joint optimisation framework, an additional study will be presented that transfers the model to a new, but related, domain: the generation of route instructions in a real navigation scenario using a situated dialogue system for indoor navigation. Results confirm that the NLG policy can be applied to new domains with limited effort and contribute to high task success and user satisfaction.
  • 2013. PhD Thesis. University of Bremen, Faculty of Linguistics, Germany.

Impact of ASR N-Best Information on Bayesian Dialogue Act Recognition.

    Hide/Show Full Abstract A challenge in dialogue act recognition is the mapping from noisy user inputs to dialogue acts. In this paper we describe an approach for re-ranking dialogue act hypotheses based on Bayesian classifiers that incorporate dialogue history and Automatic Speech Recognition (ASR) N-best information. We report results based on the Let’s Go dialogue corpora that show (1) that including ASR N-best information results in improved dialogue act recognition performance (+7% accuracy), and (2) that competitive results can be obtained from as early as the first system dialogue act, reducing the need to wait for subsequent system dialogue acts.
  • 2013. In Proceedings of the 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGdial). Metz, France.

Proceedings of the Young Researcher’s Roundtable on Spoken Dialogue Systems.

  • El Asri, L.
  • Dethlefs, N.
  • Henderson, M.
  • Kennington, C.
  • Mitchell, C.
  • Schütte, N.
  • Villalba, M.
  • Baheux, D.
  • PDF
    Hide/Show Full Abstract We are delighted to welcome you to the Ninth Young Researchers’ Roundtable on Spoken Dialogue Systems in Metz, France. YRRSDS is a yearly event that began in 2005 in Lisbon, followed by Pittsburgh, Antwerp, Columbus, London, Tokyo, Portland, and Seoul. The aim of the workshop is to promote the networking of students, post docs, and junior researchers working in research related to spoken dialogue systems in academia and industry. The workshop provides an open forum where participants can discuss their research interests, current work, and future plans.
  • 2013. Co-located with the 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGdial). Metz, France.

Proceedings of the Second Workshop on Machine Learning for Interactive Systems (MLIS‘2013): Bridging the Gap Between Perception, Action and Communication.

  • Cuayáhuitl, H.
  • Frommberger, L.
  • Dethlefs, N.
  • van Otterlo, M.
  • Link to proceedings
    Hide/Show Full Abstract Intelligent systems or robots that interact with their environment by perceiving, acting or communicating often face a challenge in how to bring these different concepts together. One of the main reasons for this challenge is the fact that the core concepts in perception, action and communication are typically studied by different communities: the computer vision, robotics and natural language processing communities, among others, without much interchange between them. Learning systems that encompass perception, action and communication in a unified and principled way are still rare. As machine learning lies at the core of these communities, it can act as a unifying factor in bringing the communities closer together. Unifying these communities is highly important for understanding how state-of-the-art approaches from different disciplines can be combined (and applied) to form generally interactive intelligent systems. MLIS-2013 aims to bring researchers from multiple disciplines together that are in some way or another affected by the gap between perception, action and communication. Our goal is to provide a forum for interdisciplinary discussion that allows researchers to look at their work from new perspectives that go beyond their core community and develop new interdisciplinary collaborations.
  • 2013. Co-located with the 23rd International Joint Conference on Artificial Intelligence (IJCAI). Beijing, China.

Machine Learning for Interactive Systems and Robots: A Brief Introduction.

  • Cuayáhuitl, H.
  • van Otterlo, M.
  • Dethlefs, N.
  • Frommberger, L.
  • PDF
    Hide/Show Full Abstract Research on interactive systems and robots, i.e. interactive machines that perceive, act and communicate, has applied a multitude of different machine learning frameworks in recent years, many of which are based on a form of reinforcement learning (RL). In this paper, we will provide a brief introduction to the application of machine learning techniques in interactive learning systems. We identify several dimensions along which interactive learning systems can be analyzed. We argue that while many applications of interactive machines seem different at first sight, sufficient commonalities exist in terms of the challenges faced. By identifying these commonalities between (learning) approaches, and by taking interdisciplinary approaches towards the challenges, we anticipate more effective design and development of sophisticated machines that perceive, act and communicate in complex, dynamic and uncertain environments.
  • 2013. In Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems (MLIS-2013): Bridging the Gap between Perception, Action and Communication (MLIS-2013). ACM International Conference Proceedings Series, 2013. Co-located with IJCAI. Beijing, China.

Barge-in Effects in Bayesian Dialogue Act Recognition and Simulation.

    Hide/Show Full Abstract Dialogue act recognition and simulation are traditionally considered separate processes. Here, we argue that both can be fruitfully treated as interleaved processes within the same probabilistic model, leading to a synchronous improvement of performance in both. To demonstrate this, we train multiple Bayes Nets that predict the timing and content of the next user utterance. A specific focus is on providing support for barge-ins. We describe experiments using the Let’s Go data that show an improvement in classification accuracy (+5%) in Bayesian dialogue act recognition involving barge-ins using partial context compared to using full context. Our results also indicate that simulated dialogues with user barge-in are more realistic than simulations without barge-in events.
  • 2013. In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Olomouc, Czech Republic.

Demonstration of the PARLANCE System: A Data-Driven, Incremental, Spoken Dialogue System for Interactive Search.

  • Hastie, H.
  • Aufaure, M.
  • Alexopoulos, P.
  • Cuayáhuitl, H.
  • Dethlefs, N.
  • Gasic, M.
  • Henderson, J.
  • Lemon, O.
  • Liu, X.
  • Mika, P.
  • Mustapha, N.
  • Rieser, V.
  • Thomson, B.
  • Tsiakoulis, P.
  • Vanrompay, Y.
  • Villazon-Terrazas, B.
  • Young, S.
  • PDF
    Hide/Show Full Abstract The Parlance system for interactive search processes dialogue at a micro-turn level, displaying dialogue phenomena that play a vital role in human spoken conversation. These dialogue phenomena include more natural turn-taking through rapid system responses, generation of backchannels, and user barge-ins. The Parlance demonstration system differentiates from other incremental systems in that it is data-driven with an infrastructure that scales well.
  • 2013. In Proceedings of the 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGdial). Metz, France.

Optimising Incremental Dialogue Decisions Using Information Density for Interactive Systems.

    Hide/Show Full Abstract Abstract Incremental processing allows system designers to address several discourse phenomena that have previously been somewhat neglected in interactive systems, such as backchannels or barge-ins, but that can enhance the responsiveness and naturalness of systems. Unfortunately, prior work has focused largely on deterministic incremental decision making, rendering system behaviour less flexible and adaptive than is desirable. We present a novel approach to incremental decision making that is based on Hierarchical Reinforcement Learning to achieve an interactive optimisation of Information Presentation (IP) strategies, allowing the system to generate and comprehend backchannels and barge-ins, by employing the recent psycholinguistic hypothesis of information density (ID) (Jaeger, 2010). Results in terms of average rewards and a human rating study show that our learnt strategy outperforms several baselines that are not sensitive to ID by more than 23%.
  • 2012. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-CoNLL). Jeju, South Korea.

Optimising Incremental Generation for Spoken Dialogue Systems: Reducing the Need for Fillers.

    Hide/Show Full Abstract Recent studies have shown that incremental systems are perceived as more reactive, natural, and easier to use than non-incremental systems. However, previous work on incremental NLG has not employed recent advances in statistical optimisation using machine learning. This paper combines the two approaches, showing how the update, revoke and purge operations typically used in incremental approaches can be implemented as state transitions in a Markov Decision Process. We design a model of incremental NLG that generates output based on micro-turn interpretations of the user’s utterances and is able to optimise its decisions using statistical machine learning. We present a proof-of-concept study in the domain of Information Presentation (IP), where a learning agent faces the trade-off of whether to present information as soon as it is available (for high reactiveness) or else to wait until input ASR hypotheses are more reliable. Results show that the agent learns to avoid long waiting times, fillers and self-corrections, by re-ordering content based on its confidence.
  • 2012. In Proceedings of the 7th International Conference on Natural Language Generation (INLG). Chicago, IL, USA.

Hierarchical Dialogue Policy Learning Using Flexible State Transitions and Linear Function Approximation.

    Hide/Show Full Abstract Conversational agents that use reinforcement learning for policy optimization in large domains often face the problem of limited scalability. This problem can be addressed either by using function approximation techniques that estimate an approximate true value function, or by using a hierarchical decomposition of a learning task into subtasks. In this paper, we present a novel approach for dialogue policy optimization that combines the benefits of hierarchical control with function approximation. The approach incorporates two concepts to allow flexible switching between sub-dialogues, extending current hierarchical reinforcement learning methods. First, hierarchical tree-based state representations initially represent a compact portion of the possible state space and are then dynamically extended in real time. Second, we allow state transitions across sub-dialogues to allow non-strict hierarchical control. Our approach is integrated, and tested with real users, in a robot dialogue system that learns to play Quiz games.
  • 2012. In Proceedings of the 24th International Conference on Computational Linguistics (COLING). System Demonstrations. Mumbai, India.

Hierarchical Multiagent Reinforcement Learning for Coordinating Verbal and Nonverbal Actions in Robots.

    Hide/Show Full Abstract This paper proposes an approach for learning to coordinate verbal and non-verbal behaviours in interactive robots. It is based on a hierarchy of multiagent reinforcement learners executing verbal and non-verbal actions in parallel. Our approach is evaluated in a conversational humanoid robot that learns to play Quiz games. First experimental results show evidence that the proposed multiagent approach can outperform hand-coded coordinated behaviours.
  • 2012. In Proceedings of the 1st Workshop on Machine Learning for Interactive Systems (MLIS’2012): Bridging the Gap Between Language, Motor Control and Vision. Co-located with ECAI. Montpellier, France.

Towards Optimising Modality Allocation for Multimodal Output Generation in Incremental Dialogue.

    Hide/Show Full Abstract Recent work on incremental processing in interactive systems has demonstrated that incremental systems can gain higher responsiveness and naturalness than their non-incremental counter-parts and are better perceived by human users. This paper presents a first investigation, based on a proof-of-concept study, into how multimodal information presentation in incremental dialogue systems can contribute towards more efficient and smooth interactions. In particular, we focus on how a combination of verbal and non-verbal output generation can help to reduce the need for self-corrections in a sys- tem that has to deal with continuous updates of input hypotheses. We suggest to use Reinforcement Learning to optimise the multimodal output allocation of a system, i.e. the idea that for every context, there is a combination of modalities which adequately communicates the communicative goal.
  • 2012. In Proceedings of the 1st Workshop on Machine Learning for Interactive Systems (MLIS’2012): Bridging the Gap Between Language, Motor Control and Vision. Co-located with ECAI. Montpellier, France.

Proceedings of the First Workshop on Machine Learning for Interactive Systems (MLIS’2012): Bridging the Gap Between Language, Motor Control and Vision.

  • Cuayáhuitl, H.
  • Frommberger, L.
  • Dethlefs, N.
  • Sahli, H.
  • PDF
    Hide/Show Full Abstract Intelligent interactive agents that are able to communicate with the world through more than one channel of communication face a number of research questions, for example: how to coordinate them in an effective manner? This is especially important given that perception, action and interaction can often be seen as mutually related disciplines that affect each other. We believe that machine learning plays and will keep playing an important role in interactive systems. Machine Learning provides an attractive and comprehensive set of computer algorithms for making interactive systems more adaptive to users and the environment and has been a central part of research in the disciplines of interaction, motor control and computer vision in recent years. This workshop aims to bring researchers together that have an interest in more than one of these disciplines and who have explored frameworks which can offer a more unified perspective on the capabilities of sensing, acting and interacting in intelligent systems and robots.
  • 2012. Co-located with the 20th European Conference on Artificial Intelligence (ECAI). Montpellier, France.

Dialogue Systems Using Online Learning: Beyond Empirical Methods.

    Hide/Show Full Abstract We discuss a change of perspective for training dialogue systems, which requires a shift from traditional empirical methods to online learning methods. We motivate the application of online learning, which provides the benefit of improving the system’s behaviour continuously often after each turn or dialogue rather than after hundreds of dialogues. We describe the requirements and advances for dialogue systems with online learning, and speculate on the future of these kinds of systems.
  • 2012. In Proceedings of the Workshop on Future Directions and Needs in the Spoken Dialogue Community: Tools and Data (SDCTD). Co-located with NAACL-HLT. Montréal, Canada.

Incremental Spoken Dialogue Systems: Tools and Data.

    Hide/Show Full Abstract Strict-turn taking models of dialogue do not accurately model human incremental processing, where users can process partial input and plan partial utterances in parallel. We discuss the current state of the art in incremental systems and propose tools and data required for further advances in the field of Incremental Spoken Dialogue Systems.
  • 2012. In Proceedings of the Workshop on Future Directions and Needs in the Spoken Dialogue Community: Tools and Data (SDCTD). Co-located with NAACL-HLT. Montréal, Canada.

Optimising Incremental Generation for Information Presentation of Mobile Search Results.

    Hide/Show Full Abstract This abstract discusses a proof-of-concept study in incremental Natural Language Generation (NLG) in the domain of Information Presentation for Spoken Dialogue Systems. The work presented is part of the FP7 EC Parlance project (http://www.parlance- project.eu). The goal of Parlance is to develop personalised, mobile, interactive, hyper-local search through speech. Recent trends in Information Retrieval are towards incremental, interactive search and we argue that spoken dialogue systems can provide a truly natural medium for this type of interactive search. This is particularly attractive for people on the move, who have their hands and eyes busy.
  • 2012. Presentation at Symposium: Influencing People with Information (SIPI). Aberdeen, Scotland.

Spatially-Aware Dialogue Control Using Hierarchical Reinforcement Learning.

    Hide/Show Full Abstract This article addresses the problem of scalable optimization for spatially-aware dialogue systems. These kinds of systems must perceive, reason, and act about the spatial environment where they are embedded. We formulate the problem in terms of Semi-Markov Decision Processes and propose a hierarchical reinforcement learning approach to optimize subbehaviors rather than full behaviors. Because of the vast number of policies that are required to control the interaction in a dynamic environment (e.g., a dialogue system assisting a user to navigate in a building from one location to another), our learning approach is based on two stages: (a) the first stage learns low-level behavior, in advance; and (b) the second stage learns high-level behavior, in real time. For such a purpose we extend an existing algorithm in the literature of reinforcement learning in order to support reusable policies and therefore to perform fast learning. We argue that our learning approach makes the problem feasible, and we report on a novel reinforcement learning dialogue system that performs a joint optimization between dialogue and spatial behaviors. Our experiments, using simulated and real environments, are based on a text-based dialogue system for indoor navigation. Experimental results in a realistic environment reported an overall user satisfaction result of 89%, which suggests that our proposed approach is attractive for its application in real interactions as it combines fast learning with adaptive and reasonable behavior.
  • 2011. ACM Transactions on Speech and Language Processing (Special Issue on Machine Learning for Robust and Adaptive Spoken Dialogue Systems). Vol. 7, No. 3, pp. 1-26.

Generation of Adaptive Route Descriptions in Urban Environments.

    Hide/Show Full Abstract This paper addresses the automatic generation of adaptive and cognitively adequate verbal route descriptions. Current automatic route descriptions suffer from a lack of adaptivity to the principles people employ in wayfinding communication, as well as to particular users’ information needs. We enhance adaptivity and cognitive adequacy by supplementing verbal route descriptions with salient geographic features, applying natural language generation techniques for linguistic realization. We also take users’ familiarity with an area into account. We present an architecture for navigational assistance operating on human cognitive and linguistic principles and report an evaluative user study that confirms the usefulness of our approach.
  • 2011. Spatial Cognition and Computation. Vol. 11, No. 2, pp. 153-177.

Hierarchical Reinforcement Learning and Hidden Markov Models for Task-Oriented Natural Language Generation.

    Hide/Show Full Abstract Surface realisation decisions in language generation can be sensitive to a language model, but also to decisions of content selection. We therefore propose the joint optimisation of content selection and surface realisation using Hierarchical Reinforcement Learning (HRL). To this end, we suggest a novel reward function that is induced from human data and is especially suited for surface realisation. It is based on a generation space in the form of a Hidden Markov Model (HMM). Results in terms of task success and human-likeness suggest that our unified approach performs better than greedy or random baselines.
  • 2011. In Proceedings of the 49th Annual Conference of the Association for Computational Linguistics (ACL-HLT). Short Papers. Portland, OR, USA.

Optimizing Situated Dialogue Management in Unknown Environments.

    Hide/Show Full Abstract We present a conversational learning agent that helps users navigate through complex and challenging spatial environments. The agent exhibits adaptive behaviour by learning spatially-aware dialogue actions while the user carries out the navigation task. To this end, we use Hierarchical Reinforcement Learning with relational representations to efficiently optimize dialogue actions tightly-coupled with spatial ones, and Bayesian networks to model the user’s beliefs of the navigation environment. Since these beliefs are continuously changing, we induce the agent’s behaviour in real time. Experimental results, using simulation, are encouraging by showing efficient adaptation to the user’s navigation knowledge, specifically to the generated route and the intermediate locations to negotiate with the user.
  • 2011. In Proceedings of INTERSPEECH. Florence, Italy.

The Bremen System for the GIVE-2.5 Challenge.

    Hide/Show Full Abstract This paper presents the Bremen system for the GIVE-2.5 challenge. It is based on decision trees learnt from new annotations of the GIVE corpus augmented with manually specified rules. Surface realisation is based on context-free grammars. The paper will address advantages and shortcomings of the approach and discuss how the present system can serve as a baseline for a future evaluation with an improved version using hierarchical reinforcement learning with graphical models.
  • 2011. In Proceedings of the 13th European Workshop on Natural Language Generation (ENLG). Generation Challenges Session. Nancy, France.

Position Paper in the Young Researchers’ Roundtable on Spoken Dialogue Systems (YRRSDS).

    Hide/Show Full Abstract My research interests involve context-sensitive, or adaptive, Natural Language Generation (NLG) for situated dialogue systems, especially for spoken interaction. Context-sensitive situated dialogue systems are typically required to adapt flexibly to dynamic changes of (a) properties of the situation or the spatial setting, such as visible objects, or the complexity of the environment, (b) properties of the user, such as their prior knowledge, goals, beliefs, and general information need, and (c) the dialogue history. In this context, I am mainly interested in applying Reinforcement Learning (RL) with hierarchical control and prior knowledge in several contexts of rather large-scale systems for complex domains. I have also recently looked into the joint optimisation of different system behaviours for interdependent decision making between them.
  • 2011. Portland, OR, USA.

Route Instructions in Map-Based and Human-Based Dialogue: A Comparative Analysis.

  • Tenbrink, T.
  • Ross, R.
  • Thomas, K.
  • Dethlefs, N.
  • Andonova, E.
  • Link to article
    Hide/Show Full Abstract When conveying information about spatial situations and goals, speakers adapt flexibly to their addressee in order to reach the communicative goal efficiently and effortlessly. Our aim is to equip a dialogue system with the abilities required for such a natural, adaptive dialogue. In this paper we investigate the strategies people use to convey route information in relation to a map by presenting two parallel studies involving human–human and human–computer interaction. We compare the instructions given to a human interaction partner with those given to a dialogue system which reacts by basic verbal responses and dynamic visualization of the route in the map. The language produced by human route givers is analyzed with respect to a range of communicative as well as cognitively crucial features, particularly perspective choice and references to locations across levels of granularity. Results reveal that speakers produce systematically different instructions with respect to these features, depending on the nature of the interaction partner, human or dialogue system. Our further analysis of clarification and reference resolution strategies produced by human route followers provides insights into dialogue strategies that future systems should be equipped with.
  • 2010. Journal of Visual Languages and Computing. Vol. 21, No. 5, pp. 292-309.

Evaluating Task Success in a Dialogue System for Indoor Navigation

  • Dethlefs, N.
  • Cuayáhuitl, H.
  • Richter, K.-F.
  • Andonova, E.
  • Bateman, J.
  • PDF
    Hide/Show Full Abstract In this paper we address the assessment of dialogue systems for indoor wayfinding. Based on the PARADISE evaluation framework we propose and evaluate several task success metrics for such a purpose. According to correlation and multiple linear regression analyses, we found that task success metrics that penalise difficulty in wayfinding are more informative of system performance than a success/failure binary task success metric.
  • 2010. In Proceedings of the 14th Workshop on the Semantics and Pragmatics of Dialogue (SemDial-PozDial). Poznan, Poland.

Generating Adaptive Route Instructions Using Hierarchical Reinforcement Learning

  • Cuayáhuitl, H.
  • Dethlefs, N.
  • Frommberger, L.
  • Richter, K.-F.
  • Bateman, J.
  • PDF
    Hide/Show Full Abstract We present a learning approach for efficiently inducing adaptive behaviour of route instructions. For such a purpose we propose a two-stage approach to learn a hierarchy of wayfinding strategies using hierarchical reinforcement learning. Whilst the first stage learns low-level behaviour, the second stage focuses on learning high-level behaviour. In our proposed approach, only the latter is to be applied at runtime in user-machine interactions. Our experiments are based on an indoor navigation scenario for a building that is complex to navigate. We compared our approach with flat reinforcement learning and a fully-learnt hierarchical approach. Our experimental results show that our proposed approach learns significantly faster than the baseline approaches. In addition, the learnt behaviour shows to adapt to the type of user and structure of the spatial environment. This approach is attractive to automatic route giving since it combines fast learning with adaptive behaviour.
  • 2010. In Proceedings of the 7th International Conference on Spatial Cognition **(Spatial Cognition VII)**. Portland, OR, USA.

The Dublin-Bremen System for the GIVE2-Challenge

    Hide/Show Full Abstract This paper describes the Dublin-Bremen GIVE-2 generation system. Our main approach focused on abstracting over the low-level behaviour of the baseline agent and guide the user by more high-level navigation information. For this purpose, we provided the user with (a) high-level action commands, (b) lookahead information, and (c) a “patience” period after they left the intended path to allow exploration. We describe a number of problems that our system encountered during the evaluation due to some of our initial assumptions not holding, and address several means by which we could achieve better performance in the future.
  • 2010. Poster presentation at the 6th International Conference on Natural Language Generation (INLG). Dublin, Ireland.

A Dialogue System for Indoor Wayfinding Using Text-Based Natural Language

  • Cuayáhuitl, H.
  • Dethlefs, N.
  • Richter, K.-F.
  • Tenbrink, T.
  • Bateman, J.
  • PDF
    Hide/Show Full Abstract We present a dialogue system that automatically generates indoor route instructions in German when asked about locations, using text-based natural language input and output. The challenging task in this system is to provide the user with a compact set of accurate and comprehensible instructions. We describe our approach based on high-level instructions. The system is described with four main modules: natural language understanding, dialogue management, route instruction generation and natural language generation. We report an evaluation with users unfamiliar with the system — using the PARADISE evaluation framework — in a real environment and naturalistic setting. We present results with high user satisfaction, and discuss future directions for enhancing this kind of system with more sophisticated and intuitive interaction.
  • 2010. International Journal of Computational Linguistics and Applications. Vol. 1, No. 2, pp. 285-304. Posted presented at the 11th Conference on Intelligent Text Processing and Computational Linguistics (CICLing).