DR NINA DETHLEFS

Short Bio

I am a Senior Lecturer in Computer Science at the University of Hull, Yorkshire, UK, where I lead the Big Data Analytics Research group. I am also currently Director of Research for Computer Science and Technology and Aura CDT Theme lead for "Big data, sensors and digitalisation for the offshore environment". Before coming to Hull, I was a Research Fellow in the The Interaction Lab at Heriot-Watt University, Edinburgh. I have a PhD in Computational Linguistics from the University of Bremen, Germany.

Research Interests

My research interests lie at the intersection of machine learning and natural language processing (NLP), particularly in the areas of data-to-text and natural language generation (NLG), interactive systems, assistive technologies, and domain transfer and adaptability for data analytics in a wider AI context. I have spent the last few years working with neural networks as a primary algorithm family but have previously worked with graphical models, clustering and reinforcement learning. Most recently I have become interested in applying AI and NLP in "useful" contexts such as mental health and the environment, particularly towards sustainability. I am interested in the digitalisation of the offshore wind industry to make wind turbines more reliable. I am also interested in the effects of human activity on water quality and the forecasting of natural events such as floods. When I have time I do some research in digital conservation using AI and text classification.

Publications

Transparent Deep Learning and Transductive Transfer Learning: A New Dimension for Wind Energy Research.

    Hide/Show Full Abstract Wind turbines suffer from operational inconsistencies due to a variety of factors, ranging from environmental changes, to intrinsic anomalies in specific components, such as gearbox, generator, pitch system etc. Condition monitoring of wind turbines has been a critical research area in the last decade, wherein the Supervisory Control & Data Acquisition (SCADA) data is used to analyse the operational behaviour of the turbine and predict any incipient faults to prevent catastrophic losses caused by unexpected failures. Machine learning models have formed a large part of the data-analytics based methods used for learning from historical failures through supervised learning, but they suffer from the lack of ability to provide additional capabilities for learning with little labelled data, or for that matter, no labelled faults in a different domain. Deep learning has shown immense success in areas where time-series data is to be modelled. In this paper, we propose a hybrid deep learning model combining a Long short-term memory network (LSTM) with XGBoost, a decision tree-based classifier for providing the benefits of accuracy through deep learning, and transparency through traditional decision trees. Our study shows that Transfer learning allows us to make predictions with increasing accuracy on unseen data; which is useful for simulations of new operations, new wind farms or other cases of non-available training data. This can help reduce downtime of turbines through predictive maintenance, by predicting incipient faults, or provide corrective maintenance, by assisting the engineers and technicians to analyse the root causes behind the failure, thus contributing to the reliability and uptake of wind energy as a sustainable and promising domain.
  • 2019. In WindEurope Offshore, Copenhagen, Denmark.

Bidirectional Dilated LSTM with Attention for Fine-grained Emotion Classification in Tweets.

    Hide/Show Full Abstract We propose a novel approach for fine-grained emotion classification in tweets using a Bidirectional Dilated LSTM (BiDLSTM) with attention. Conventional LSTM architectures can face problems when classifying long sequences, which is problematic for tweets, where crucial information is often attached to the end of a sequence, e.g. an emoticon. We show that by adding a bidirectional layer, dilations and attention mechanism to a standard LSTM, our model overcomes these problems and is able to maintain complex data dependencies over time. We present experiments with two datasets, the 2018 WASSA Implicit Emotions Shared Task and a new dataset of 240,000 tweets. Our BiDLSTM with attention achieves a test accuracy of up to 81.97% outperforming competitive baselines by up to 10.52% on both datasets. Finally, we evaluate our data against a human benchmark on the same task.
  • To appear. In Proceedings of AAAI-2020 Workshop on Affective Content Analysis. New York, USA.

Natural Language Generation for Operations and Maintenance in Wind Turbines.

    Hide/Show Full Abstract Wind energy is one of the fastest-growing sustainable energy sources in the world but relies crucially on efficient and effective operations and maintenance to generate sufficient amounts of energy and reduce downtime of wind turbines and associated costs. Machine learning has been applied to fault prediction in wind turbines, but these predictions have not been supported with suggestions on how to avert and fix faults. We present a data-to-text generation system using transformers to produce event descriptions from SCADA data capturing the operational status of turbines and proposing maintenance strategies. Experiments show that our model learns feature representations that correspond to expert judgements. In making a contribution to the reliability of wind energy, we hope to encourage organisations to switch to sustainable energy sources and help combat climate change.
  • 2019. In NeurIPS 2019 Workshop on Tackling Climate Change with Machine Learning. Vancouver, Canada.

Dilated LSTM with ranked units for classification of suicide notes.

    Hide/Show Full Abstract Recent statistics in suicide prevention show that people are increasingly posting their last words online and with the unprecedented availability of textual data from social media platforms researchers have the opportunity to analyse such data. Furthermore, psychological studies have shown that our state of mind can manifest itself in the linguistic features we use to communicate. In this paper, we investigate whether it is possible to automatically identify suicide notes from other types of social media blogs in a document-level classification task. Also, we present a learning model for modelling long sequences, achieving an f1-score of 0.84 over the baselines of 0.53 and 0.80 (best competing model). Finally, we also show through visualisations which features the learning model identifies.
  • 2019. In AI for Social Good workshop at NeurIPS (2019), Vancouver, Canada.

Dilated LSTM with attention for Classification of suicide notes.

    Hide/Show Full Abstract In this paper we present a dilated LSTM with attention mechanism for document-level classification of suicide notes, last statements and depressed notes. We achieve an accuracy of 87.34% compared to competitive baselines of 80.35% (Logistic Model Tree) and 82.27% (Bi-directional LSTM with Attention). Furthermore, we provide an analysis of both the grammatical and thematic content of suicide notes, last statements and depressed notes. We find that the use of personal pronouns, cognitive processes and references to loved ones are most important. Finally, we show through visualisations of attention weights that the Dilated LSTM with attention is able to identify the same distinguishing features across documents as the linguistic analysis.
  • 2019. In Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019) at EMNLP. Hong Kong.

Cross-dialectal speech processing

    Hide/Show Full Abstract Despite advances in technology, language diversity remains a challenge to the speech processing community, but there is also an opportunity to rise to this challenge through research and innovation. Pluricentric languages play an important role in such work, particularly where these languages are better resourced. Dedicated researchers across several decades, have steadily contributed resources for some language varieties, increasing general availability of a range of data archives...
  • 2019. INTERSPEECH Satellite Workshop on Pluricentric Languages in Speech Technology, Graz, Austria.

Modularity Within Artificial Gene Regulatory Networks

    Hide/Show Full Abstract Modularity is a feature of found in biological systems where it is common for functionally related processes to evolve to be individually discrete units. Such traits are prevelant in prokaryotic genomes. This work aims to understand to what extent artificial gene regulatory networks AGRNs, which take inspiration from gene regulation in nature will self-divide into modular task specific sub-networks consisting of multiple interacting nodes when solving multiple complex tasks. To investigate this, we evolve AGRNs to solve three different tasks with ranging dynamics simultaneously and evaluate the network structure. From this we aim to build an understanding of whether modularity in AGRNs is fundamental to solving multiple tasks and what effect the nature of the tasks being solved has on modularity within the networks.
  • 2019. IEEE Congress on Evolutionary Computation, Wellington, New Zealand.

A Deep Learning Approach Towards Prediction of Faults in Wind Turbines.

    Hide/Show Full Abstract With the rising costs of conventional sources of en- ergy, the world is moving towards sustainable energy sources including wind energy. Wind turbines consist of several electrical and mechanical components and experience an enormous amount of irregular loads, making their operational behaviour at times inconsis- tent. Operations and Maintenance (O&M) is a key factor in monitoring such inconsistent behaviour of the turbines in order to predict and prevent any in- cipient faults which may occur in the near future.
  • 2019. Extended Abstract in Northern Lights Deep Learning Workshop (NLDL), Tromso, Norway.

Evolutionary Constraint in Artificial Gene Regulatory Networks.

    Hide/Show Full Abstract Evolutionary processes such as convergent evolution and rapid adaptation which suggest that there are constraints on how organisms evolve. Without constraint, such processes would most likely not be possible in the time frame in which they are seen. This paper investigates how artificial gene regulatory networks (GRNs), a connectionist architecture designed for computational problem solving may too be constrained in its evolutionary pathway. To understand this further, GRNs are applied to two different computational tasks and the way their underlying genes evolve over time is observed. From this, rules about how often genes are evolved and how this correlates with thier connectivity within the GRN are deduced. By generating and applying these rules, we can build an understanding of how GRNs are constrained in their evolutionary path, and build measures to exploit this to improve evolutionary performance and speed.
  • 2018. In Proceedings of the 18th Annual UK Workshop on Computational Intelligence, Nottingham, UK. Volume 840 of the Advances in Intelligent Systems and Computing.

Unsupervised suicide note classification.

    Hide/Show Full Abstract With the greater availability of linguistic data from public social media platforms and the advancements of natural language processing, a number of opportunities have arisen for researchers to analyse this type of data. Research efforts have mostly focused on detecting the polarity of textual data, evaluating whether there is positive, negative or sometimes neutral content. Especially the use of neural networks has recently yielded significant results in polarity detection experiments. In this paper we present a more fine-grained approach to detecting sentiment in textual data, particularly analysing a corpus of suicide notes, depressive notes and love notes. We achieve a classification accuracy of 71.76% when classifying based on text and sentiment features, and an accuracy of 69.41% when using the words present in the notes alone. We discover that while emotions in all three datasets overlap, each of them has a unique ‘emotion profile’ which allows us to draw conclusions about the potential mental state that is reflects. Using the emotion sequences only, we achieve an accuracy of 75.29%. The results from unannotated data, while worse than the other models, nevertheless represent an encouraging step towards being able to flag potentially harmful social media posts online and in real time. We provide a high-level corpus analysis of the data sets in order to demonstrate the grammatical and emotional differences.
  • 2018. In Proceedings of the 7th KDD Workshop on Issues of Sentiment Discovery and Opinion Mining (WISDOM), co-located with the Knowledge Discovery and Data Mining (KDD), London, UK.

Domain Transfer for Deep Natural Language Generation from Abstract Meaning Representations.

    Hide/Show Full Abstract Stochastic natural language generation systems that are trained from labelled datasets are often domain-specific in their annotation and in their mapping from semantic input representations to lexical-syntactic outputs. As a result, learnt models fail to generalize across domains, heavily restricting their usability beyond single applications. In this article, we focus on the problem of domain adaptation for natural language generation. We show how linguistic knowledge from a source domain, for which labelled data is available, can be adapted to a target domain by reusing training data across domains. As a key to this, we propose to employ abstract meaning representations as a common semantic representation across domains. We model natural language generation as a long short- term memory recurrent neural network encoder-decoder, in which one recurrent neural network learns a latent representation of a semantic input, and a second recurrent neural network learns to decode it to a sequence of words. We show that the learnt representations can be transferred across domains and can be leveraged effectively to improve training on new unseen domains. Experiments in three different domains and with six datasets demonstrate that the lexical-syntactic constructions learnt in one domain can be transferred to new domains and achieve up to 75-100% of the performance of in-domain training. This is based on objective metrics such as BLEU and semantic error rate and a subjective human rating study. Training a policy from prior knowledge from a different domain is consistently better than pure in-domain training by up to 10%.
  • 2017. IEEE Computational Intelligence Magazine: Special Issue on Natural Language Generation with Computational Intelligence.

Transparency Of Execution Using Epigenetic Networks.

    Hide/Show Full Abstract This paper describes how the recurrent connectionist architecture epiNet, which is capable of dynamically modifying its topology, is able to provide a form of transparent execution. EpiNet, which is inspired by eukaryotic gene regulation in nature, is able to break its own architecture down into sets of smaller interacting networks. This allows for autonomous complex task decomposition, and by analysing these smaller interacting networks, it is possible to provide a real world understanding of why specific decisions have been made. We expect this work to be useful in fields where the risk of improper decision making is high, such as medical simulations, diagnostics and financial modelling. To test this hypothesis we apply epiNet to two data sets within UCI’s machine learning repository, each of which requires a specific set of behaviours to solve. We then perform analysis on the overall functionality of epiNet in order to deduce the underlying rules behind its functionality and in turn provide transparency of execution.
  • 2017. In Proceedings of the European Conference on Artificial Life (ECAL), Lyon, France.

Deep text generation - Using hierarchical decomposition to mitigate the effect of rare data points.

    Hide/Show Full Abstract Deep learning has recently been adopted for the task of natural language generation (NLG) and shown remarkable results. However, learning can go awry when the input dataset is too small or not well balanced with regards to the examples it contains for various input sequences. This is relevant to naturally occurring datasets such as many that were not prepared for the task of natural language processing but scraped off the web and originally prepared for a different purpose. As a mitigation to the problem of unbalanced training data, we therefore propose to decompose a large natural language dataset into several subsets that “talk about” the same thing. We show that the decomposition helps to focus each learner’s attention during training. Results from a proof-of-concept study show 73% times faster learning over a flat model and better results.
  • 2017. In Proceedings of Language, Data and Knowledge (LDK), Galway, Ireland. Proceedings in: Springer Lecture Notes in Computer Science (LNCS).

DEFIne: A Fluent Interface DSL for Deep Learning Applications.

    Hide/Show Full Abstract Recent years have seen a surge of interest in deep learning models that outperform other machine learning algorithms on benchmarks across many disciplines. Most existing deep learning libraries facilitate the development of neural nets by providing a mathematical framework that helps users implement their models more efficiently. This still represents a substantial investment of time and effort, however, when the intention is to compare a range of competing models quickly for a specific task. We present DEFIne, a fluent interface DSL for the specification, optimisation and evaluation of deep learning models. The fluent interface is implemented through method chaining. DEFIne is embedded in Python and is build on top of its most popular deep learning libraries, Keras and Theano. It extends these with common operations for data pre-processing and representation as well as visualisation of datasets and results. We test our framework on three benchmark tasks from different domains: heart disease diagnosis, hand-written digit recognition and weather forecast generation. Results in terms of accuracy, runtime and lines of code show that our DSL achieves equivalent accuracy and runtime to state-of-the-art models, while requiring only about 10 lines of code per application.
  • 2017. In Proceedings of the 2nd International Workshop on Real World Domain Specific Languages (RWDSL), co-located with the International Symposium on Code Generation and Optimisation (CGO’17). Austin, Texas. In: ACM Digital Library, International Conference Proceedings Series (ICPS).

Natural language-based presentation of cognitive stimulation to people with dementia in assistive technology: a pilot study.

  • Dethlefs, N.
  • Milders, M.
  • Cuayáhuitl, H.
  • Al-Salkini, T.
  • Douglas, D.
  • PDF
    Hide/Show Full Abstract Currently, an estimated 36 million people worldwide are affected by Alzheimer’s disease or related dementias. In the absence of a cure, non-pharmacological interventions, such as cognitive stimulation, which slow down the rate of deterioration can benefit people with dementia and their caregivers. Such interven- tions have shown to improve well-being and slow down the rate of cognitive decline. It has further been shown that cognitive stimulation in interaction with a computer is as effective as with a human. However, the need to operate a computer often repre- sents a difficulty for the elderly and stands in the way of widespread adoption. A possible solution to this obstacle is to provide a spoken natural language interface that allows people with dementia to interact with the cognitive stimulation software in the same way as they would interact with a human caregiver. This makes the assistive technology accessible to users regardless of their technical skills and provides a fully intuitive user experience. This article describes a pilot study that evaluated the feasibility of computer-based cognitive stimulation through a spoken natural language interface. A prototype software was evaluated with 23 users, including healthy elderly people and people with dementia. Feedback was overwhelmingly positive.
  • 2017. Informatics for Health and Social Care.

Extrinsic vs Intrinsic Evaluation of Natural Language Generation for Spoken Dialogue Systems and Social Robotics.

  • Hastie, H.
  • Cuayáhuitl, H.
  • Dethlefs, N.
  • Keizer, S.
  • Liu, X.
  • Link to book
    Hide/Show Full Abstract [Book abstract] In the past 10 years, very few published studies include some kind of extrinsic evaluation of an NLG component in an end-to-end-system, be it for phone or mobile-based dialogues or social robotic interaction. This may be attributed to the fact that these types of evaluations are very costly to set-up and run for a single component. The question therefore arises whether there is anything to be gained over and above intrinsic quality measures obtained in off-line experiments? In this article, we describe a case study of evaluating two variants of an NLG surface realiser and show that there are significant differences in both extrinsic measures and intrinsic measures. These differences can be used to inform further iterations of component and system development.
  • 2016. In Jokinen, Kristiina and Wilcock, Graham (eds.) Dialogues with Social Robots – Enablements, Analyses, and Evaluation. Berlin: Springer Lecture Notes in Electrical Engineering (LNEE). ISBN 978-981-10-2584-6.

Automatic Identification of Suicide Notes from Linguistic and Sentiment Features.

    Hide/Show Full Abstract Psychological studies have shown that our state of mind can manifest itself in the linguistic features we use to communicate. Recent statistics in suicide prevention show that young people are increasingly posting their last words online. In this paper, we investigate whether it is possible to automatically identify suicide notes and discern them from other types of online discourse based on analysis of sentiments and linguistic features. Using supervised learning, we show that our model achieves an accuracy of 86.6%, outperforming previous work on a similar task by over 4%.
  • 2016. In Proceedings of The 10th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), co-located with ACL-2016. Berlin, Germany.

Information Density and Overlaps in Spoken Dialogue.

  • Dethlefs, N.
  • Hastie, H.
  • Cuayáhuitl, H.
  • Yu, Y.
  • Rieser, V.
  • Lemon, O.
  • PDF
    Hide/Show Full Abstract Incremental dialogue systems are often perceived as more responsive and natural because they are able to address phenomena of turn-taking and overlapping speech, such as backchannels or barge-ins. Previous work in this area has often identified distinctive prosodic features, or features relating to syntactic or semantic completeness, as marking appropriate places of turn-taking. In a separate strand of work, psycholinguistic studies have established a connection between information density and prominence in language—the less expected a linguistic unit is in a particular context, the more likely it is to be linguistically marked. This has been observed across linguistic levels, including the prosodic, which plays an important role in predicting overlapping speech. In this article, we explore the hypothesis that information density (ID) also plays a role in turn-taking. Specifically, we aim to show that humans are sensitive to the peaks and troughs of information density in speech, and that over-lapping speech at ID troughs is perceived as more acceptable than overlaps at ID peaks. To test our hypothesis, we collect human ratings for three models of generating overlapping speech based on features of: (1) prosody and semantic or syntactic completeness, (2) information density, and (3) both types of information. Results show that over 50% of users preferred the version using both types of features, followed by a preference for information density features alone. This indicates a clear human sensitivity to the effects of information density in spoken language and provides a strong motivation to adopt this metric for the design, development and evaluation of turn-taking modules in spoken and incremental dialogue systems.
  • 2016. Computer Speech and Language 37, pp. 82–97.

Why bother? Is evaluation of NLG in an end-to-end Spoken Dialogue System worth it?

  • Hastie, H.
  • Dethlefs, N.
  • Cuayáhuitl, H.
  • Keizer, S.
  • Liu, X.
  • PDF
    Hide/Show Full Abstract In the past 10 years, only around 15% of published conference papers include some kind of extrinsic evaluation of an NLG component in an end-to-end system. These types of evaluations are costly to set-up and run, so is it worth it? Is there anything to be gained over and above intrinsic quality measures obtained in off-line experiments? In this paper, we describe a case study of evaluating two variants of an NLG surface realiser and show that there are significant differences in both extrinsic measures and intrinsic measures. These significant differences would need to be factored into future iterations of the component and therefore, we con- clude that extrinsic evaluations are worthwhile.
  • 2016. In Proceedings of the International Workshop on Spoken Dialogue Systems (IWSDS). Ivalo, Finland.

Hierarchical Reinforcement Learning for Situated Language Generation.

    Hide/Show Full Abstract Natural Language Generation systems in interactive settings often face a multitude of choices, given that the communicative effect of each utterance they generate depends crucially on the interplay between its physical circumstances, addressee and interaction history. This is particularly true in interactive and situated settings. In this paper we present a novel approach for situated Natural Language Generation in dialogue that is based on hierarchical reinforcement learning and learns the best utterance for a context by optimisation through trial and error. The model is trained from human–human corpus data and learns particularly to balance the trade-off between efficiency and detail in giving instructions: the user needs to be given sufficient information to execute their task, but without exceeding their cognitive load. We present results from simulation and a task-based human evaluation study comparing two different versions of hierarchical reinforcement learning: One operates using a hierarchy of policies with a large state space and local knowledge, and the other additionally shares knowledge across generation subtasks to enhance performance. Results show that sharing knowledge across subtasks achieves better performance than learning in isolation, leading to smoother and more successful interactions that are better perceived by human users.
  • 2015. Natural Language Engineering 21, pp 391–435. Cambridge University Press.

Proceedings of the 4th International Workshop on Machine Learning for Interactive Systems. Co-located with the International Conference on Machine Learning (ICML), Lille, France.

  • Cuayáhuitl, H.
  • Dethlefs, N.
  • Frommberger, L.
  • van Otterlo, M.
  • Pietquin, O.
  • Link to proceedings
    Hide/Show Full Abstract Learning systems or robots that interact with their environment by perceiving, acting or communicating often face a challenge in how to bring these different concepts together. This challenge arises because core concepts are typically studied within their respective communities, such as the computer vision, robotics and natural language processing communities, among others. A commonality across communities is the use of machine learning techniques and algorithms. In this way, machine learning is crucial in the development of truly intelligent systems, not just by providing techniques and algorithms, but also by acting as a unifying factor across communities, encouraging communication, discussion and exchange of ideas. [...]
  • 2015. Proceedings in Journal of Machine Learning Research (JMLR): Workshop and Conference Proceedings.

Cluster-Based Prediction of User Ratings for Stylistic Surface Realisation.

  • Dethlefs, N.
  • Cuayáhuitl, H.
  • Hastie, H.
  • Rieser, V.
  • Lemon, O.
  • PDF
    Hide/Show Full Abstract Surface realisations typically depend on their target style and audience. A challenge in estimating a stylistic realiser from data is that humans vary significantly in their subjective perceptions of linguistic forms and styles, leading to almost no correlation between ratings of the same utterance. We address this problem in two steps. First, we estimate a mapping function between the linguistic features of a corpus of utterances and their human style ratings. Users are partitioned into clusters based on the similarity of their ratings, so that ratings for new utterances can be estimated, even for new, unknown users. In a second step, the estimated model is used to re-rank the outputs of a number of surface realisers to produce stylistically adaptive output. Results confirm that the generated styles are recognisable to human judges and that predictive models based on clusters of users lead to better rating predictions than models based on an average population of users.
  • 2014. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL). Gothenburg, Sweden.

A Semi-Supervised Clustering Approach for Semantic Slot Labelling.

    Hide/Show Full Abstract Work on training semantic slot labellers for use in Natural Language Processing applications has typically either relied on large amounts of labelled input data, or has assumed entirely unlabelled inputs. The former technique tends to be costly to apply, while the latter is often not as accurate as its supervised counterpart. Here, we present a semi-supervised learning approach that automatically labels the semantic slots in a set of training data and aims to strike a balance between the dependence on labelled data and prediction accuracy. The essence of our algorithm is to cluster clauses based on a similarity function that combines lexical and semantic information. We present experiments that compare different similarity functions for both our semi-supervised setting and a fully unsupervised baseline. While semi-supervised learning expectedly outperforms unsupervised learning, our results show that (1) this effect can be observed based on very few training data instances and that increasing the size of the training data does not lead to better performance, and (2) that lexical and semantic information contribute differently in different domains so that clustering based on both types of information offers the best generalisation.
  • 2014. In Proceedings of the International Conference on Machine Learning and Applications (ICMLA). Detroit, USA.

Training a Statistical Surface Realiser from Automatic Slot Labelling.

    Hide/Show Full Abstract Training a statistical surface realiser typically relies on labelled training data or parallel data sets, such as corpora of paraphrases. The procedure for obtaining such data for new domains is not only time-consuming, but it also restricts the incorporation of new semantic slots during an interaction, i.e. using an online learning scenario for automatically extended domains. Here, we present an alternative approach to statistical surface realisation from unlabelled data through automatic semantic slot labelling. The essence of our algorithm is to cluster clauses based on a similarity function that combines lexical and semantic information. Annotations need to be reliable enough to be utilised within a spoken dialogue system. We compare different similarity functions and evaluate our surface realiser—trained from unlabelled data—in a human rating study. Results confirm that a surface realiser trained from automatic slot labels can lead to outputs of comparable quality to outputs trained from human-labelled inputs.
  • 2014. In Proceedings of the IEEE Workshop on Spoken Language Technology (SLT). South Lake Tahoe, USA.

The PARLANCE Mobile App for Interactive Search in English and Mandarin.

  • Hastie, H.
  • Aufaure, M.
  • Alexopoulos, P.
  • Bouchard, H.
  • Cuayáhuitl, H.
  • Dethlefs, N.
  • Gasic, M.
  • Henderson, J.
  • Lemon, O.
  • Liu, X.
  • Mika, P.
  • Ben Mustapha, N.
  • Potter, T.
  • Rieser, V.
  • Thomson, B.
  • Tsiakoulis, P.
  • Vanrompay, Y.
  • Villa-Terrazas, B.
  • Yazdani, M.
  • Young, S.
  • Yu, Y.
  • PDF
    Hide/Show Full Abstract We demonstrate a mobile application in English and Mandarin to test and evaluate components of the Parlance dialogue system for interactive search under real-world conditions.
  • 2014. In Proceedings of the Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGdial).

Non-Strict Hierarchical Reinforcement Learning for Interactive Systems and Robots.

    Hide/Show Full Abstract Conversational systems and robots that use reinforcement learning for policy optimization in large domains often face the problem of limited scalability. This problem has been addressed either by using function approximation techniques that estimate the approximate true value function of a policy or by using a hierarchical decomposition of a learning task into subtasks. We present a novel approach for dialogue policy optimization that combines the benefits of both hierarchical control and function approximation and that allows flexible transitions between dialogue subtasks to give human users more control over the dialogue. To this end, each reinforcement learning agent in the hierarchy is extended with a subtask transition function and a dynamic state space to allow flexible switching between subdialogues. In addition, the subtask policies are represented with linear function approximation in order to generalize the decision making to situations unseen in training. Our proposed approach is evaluated in an interactive conversational robot that learns to play quiz games. Experimental results, using simulation and real users, provide evidence that our proposed approach can lead to more flexible (natural) interactions than strict hierarchical control and that it is preferred by human users.
  • 2014. ACM Transactions on Interactive Intelligent Systems. Vol. 4, No. 4.

Context-Sensitive Natural Language Generation: From Knowledge-Driven to Data-Driven Techniques.

    Hide/Show Full Abstract Context-sensitive Natural Language Generation is concerned with the automatic generation of system output that is in several ways adaptive to its target audience or the situational circumstances of its production. In this article, I will provide an overview of the most popular methods that have been applied to context-sensitive generation. A particular focus will be on the shift from knowledge-driven to data- driven approaches that has been witnessed in the last decade. While this shift has offered powerful new methods for large-scale adaptivity and flexible output generation, purely data-driven approaches still struggle to reach the linguistic depth of their knowledge-driven predecessors. Bridging the gap between both types of approaches is therefore an important future research direction.
  • 2014. Language and Linguistics Compass, Vol. 8(3), pp. 99–115.

Introduction to the Special Issue on Machine Learning for Multiple Modalities in Interactive Systems and Robots.

  • Cuayáhuitl, H.
  • Frommberger, L.
  • Dethlefs, N.
  • Raux, A.
  • Marge, M.
  • Zender, H.
  • Link to article
    Hide/Show Full Abstract This special issue highlights research articles that apply machine learning to robots and other systems that interact with users through more than one modality, such as speech, gestures, and vision. For example, a robot may coordinate its speech with its actions, taking into account (audio-)visual feedback during their execution. Machine learning provides interactive systems with opportunities to improve performance not only of individual components but also of the system as a whole. However, machine learning methods that encompass multiple modalities of an interactive system are still relatively hard to find. The articles in this special issue represent examples that contribute to filling this gap.
  • 2014. ACM Transactions on Interactive Intelligent Systems (ACM-TiiS).

Proceedings of the Second Workshop on Machine Learning for Interactive Systems (MLIS-2014): Bridging the Gap Between Perception, Action and Communication.

  • Cuayáhuitl, H.
  • Frommberger, L.
  • Dethlefs, N.
  • van Otterlo, M.
  • PDF
    Hide/Show Full Abstract The AAAI-14 Workshop program was held Sunday and Monday, July 27– 28, 2014, at the Québec City Conven- tion Centre in Québec, Canada. The AAAI-14 workshop program included 15 workshops covering a wide range of topics in artificial intelligence. The titles of the workshops were Artificial Intelli- gence and Robotics; Artificial Intelli- gence Applied to Assistive Technologies and Smart Environments; Cognitive Computing for Augmented Human Intelligence; Computer Poker and Imperfect Information; Discovery Infor- matics; Incentives and Trust in Elec- tronic Communities; Intelligent Cine- matography and Editing; Machine Learning for Interactive Systems: Bridg- ing the Gap Between Perception, Action, and Communication; Modern Artificial Intelligence for Health Analytics; Mul- tiagent Interaction Without Prior Coor- dination; Multidisciplinary Workshop on Advances in Preference Handling; Semantic Cities — Beyond Open Data to Models, Standards, and Reasoning; Sequential Decision Making with Big Data; Statistical Relational AI; and the World Wide Web and Public Health Intelligence. This article presents short summaries of those events.
  • 2014. Co-located with the 28th Conference on Artificial Intelligence (AAAI), Quebec City, Canada.

A Joint Learning Approach for Situated Language Generation.

    Hide/Show Full Abstract [Book abstract] An informative and comprehensive overview of the state-of-the-art in natural language generation (NLG) for interactive systems, this guide serves to introduce graduate students and new researchers to the field of natural language processing and artificial intelligence, while inspiring them with ideas for future research. Detailing the techniques and challenges of NLG for interactive applications, it focuses on the research into systems that model collaborativity and uncertainty, are capable of being scaled incrementally, and can engage with the user effectively. A range of real-world case studies is also included. The book and the accompanying website feature a comprehensive bibliography, and refer the reader to corpora, data, software and other resources for pursuing research on natural language generation and interactive systems, including dialog systems, multimodal interfaces and assistive technologies. It is an ideal resource for students and researchers in computational linguistics, natural language processing and related fields.
  • 2014. In Amanda Stent and Srinivas Bangalore (eds.) Natural Language Generation in Interactive Systems. Cambridge University Press.

Getting to Know Users: Accounting for the Variability in User Ratings.

  • Dethlefs, N.
  • Cuayáhuitl, H.
  • Hastie, H.
  • Rieser, V.
  • Lemon, O.
  • PDF
    Hide/Show Full Abstract Evaluations of dialogue systems and language generators often rely on subjective user ratings to assess output quality and performance. Humans however vary in their preferences so that estimating an accurate prediction model is difficult. Using a method that clusters utterances based on their linguistic features and ratings (Dethlefs et al., 2014), we discuss the possibility of obtaining user feedback implicitly during an interaction. This approach promises better predictions of user preferences through continuous re-estimation.
  • 2014. Poster paper in the Workshop on the Semantics and Pragmatics of Dialogue (SemDial). Edinburgh, Scotland.

Two Alternative Frameworks for Deploying Spoken Dialogue Systems to Mobile Platforms for Evaluation “in the Wild”.

  • Hastie, H.
  • Aufaure, M.
  • Alexopoulos, P.
  • Bouchard, H.
  • Cuayáhuitl, H.
  • Dethlefs, N.
  • Gasic, M.
  • Henderson, J.
  • Lemon, O.
  • Liu, X.
  • Mika, P.
  • Potter, T.
  • Rieser, V.
  • Tsiakoulis, P.
  • Vanrompay, Y.
  • Villa-Terrazas, B.
  • Yazdani, M.
  • Young, S.
  • Yu, Y.
  • PDF
    Hide/Show Full Abstract We demonstrate two alternative frameworks for testing and evaluating spoken dialogue systems on mobile devices for use “in the wild”. We firstly present a spoken dialogue system that uses third party ASR (Automatic Speech Recognition) and TTS (Text-To-Speech) components and then present an alternative using audio compression to allow for entire systems with home-grown ASR/TTS to be plugged in directly. Some advantages and drawbacks of both are discussed.
  • 2014. Poster paper in the Workshop on the Semantics and Pragmatics of Dialogue (SemDial). Edinburgh, Scotland.

Conditional Random Fields for Responsive Surface Realisation Using Global Features.

    Hide/Show Full Abstract Surface realisers in spoken dialogue systems need to be more responsive than conventional surface realisers. They need to be sensitive to the utterance context as well as robust to partial or changing generator inputs. We formulate surface realisation as a sequence labelling task and combine the use of conditional random fields (CRFs) with semantic trees. Due to their extended notion of context, CRFs are able to take the global utterance context into account and are less constrained by local features than other realisers. This leads to more natural and less repetitive surface realisation. It also allows generation from partial and modified inputs and is therefore applicable to incremental surface realisation. Results from a human rating study confirm that users are sensitive to this extended notion of context and assign ratings that are significantly higher (up to 14%) than those for taking only local context into account.
  • 2013. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL). Sofia, Bulgaria.

Hierarchical Joint Learning for Natural Language Generation.

    Hide/Show Full Abstract Natural Language Generation (NLG) systems typically face an uncertainty regarding the best utterance to communicate to a user in a given context given that the effect of a single utterance depends crucially on the interplay between its physical environment, pragmatic circumstances, addressee and interaction history. NLG system designers have traditionally used a pipeline architecture to divide the generation process into the distinct stages of content selection, utterance planning and surface realisation to choose the semantics, organisation and realisation of an utterance. Unfortunately, this sequential model does not account for the interdependencies that exist among these stages, which in practice has been manifest in inefficient instruction giving and an increased cognitive load for the user. This thesis will advocate a joint optimisation framework for situated NLG that is based on Hierarchical Reinforcement Learning combined with graphical models and will learn the best utterance for a given context by optimising its behaviour through a trial and error search. The joint model considers decisions at different NLG stages in interdependence with each other and thereby produces more context-sensitive utterances than is possible when considering decisions in isolation. To enhance the human-likeness of the model, two augmentations will be made. We will introduce the notion of a Hierarchical Information State to support the systematic pre-specification of prior knowledge and human preferences for content selection. Graphical models—Hidden Markov Models and Bayesian Networks—will then be integrated as generation spaces to encourage natural surface realisation by balancing the proportion of alignment and variation. Results from a human evaluation study show that the hierarchical learning agent learns a robust generation policy that adapts to new circumstances and users flexibly leading to smooth and successful interactions. In terms of the comparison between a joint and an isolated optimisation, results indicate that a jointly optimised system achieves higher user satisfaction and task success and is better perceived by human users than its isolated counterpart. To demonstrate the domain-independence and generalisabilty of the hierarchical joint optimisation framework, an additional study will be presented that transfers the model to a new, but related, domain: the generation of route instructions in a real navigation scenario using a situated dialogue system for indoor navigation. Results confirm that the NLG policy can be applied to new domains with limited effort and contribute to high task success and user satisfaction.
  • 2013. IOS Press / AKA Publishing. In Series Dissertations on Artificial Intelligence, Volume 340. ISBN 978-1-61499-115-1. Amsterdam / Berlin.

Hierarchical Joint Learning for Natural Language Generation.

    Hide/Show Full Abstract Natural Language Generation (NLG) systems typically face an uncertainty regarding the best utterance to communicate to a user in a given context given that the effect of a single utterance depends crucially on the interplay between its physical environment, pragmatic circumstances, addressee and interaction history. NLG system designers have traditionally used a pipeline architecture to divide the generation process into the distinct stages of content selection, utterance planning and surface realisation to choose the semantics, organisation and realisation of an utterance. Unfortunately, this sequential model does not account for the interdependencies that exist among these stages, which in practice has been manifest in inefficient instruction giving and an increased cognitive load for the user. This thesis will advocate a joint optimisation framework for situated NLG that is based on Hierarchical Reinforcement Learning combined with graphical models and will learn the best utterance for a given context by optimising its behaviour through a trial and error search. The joint model considers decisions at different NLG stages in interdependence with each other and thereby produces more context-sensitive utterances than is possible when considering decisions in isolation. To enhance the human-likeness of the model, two augmentations will be made. We will introduce the notion of a Hierarchical Information State to support the systematic pre-specification of prior knowledge and human preferences for content selection. Graphical models—Hidden Markov Models and Bayesian Networks—will then be integrated as generation spaces to encourage natural surface realisation by balancing the proportion of alignment and variation. Results from a human evaluation study show that the hierarchical learning agent learns a robust generation policy that adapts to new circumstances and users flexibly leading to smooth and successful interactions. In terms of the comparison between a joint and an isolated optimisation, results indicate that a jointly optimised system achieves higher user satisfaction and task success and is better perceived by human users than its isolated counterpart. To demonstrate the domain-independence and generalisabilty of the hierarchical joint optimisation framework, an additional study will be presented that transfers the model to a new, but related, domain: the generation of route instructions in a real navigation scenario using a situated dialogue system for indoor navigation. Results confirm that the NLG policy can be applied to new domains with limited effort and contribute to high task success and user satisfaction.
  • 2013. PhD Thesis. University of Bremen, Faculty of Linguistics, Germany.

Impact of ASR N-Best Information on Bayesian Dialogue Act Recognition.

    Hide/Show Full Abstract A challenge in dialogue act recognition is the mapping from noisy user inputs to dialogue acts. In this paper we describe an approach for re-ranking dialogue act hypotheses based on Bayesian classifiers that incorporate dialogue history and Automatic Speech Recognition (ASR) N-best information. We report results based on the Let’s Go dialogue corpora that show (1) that including ASR N-best information results in improved dialogue act recognition performance (+7% accuracy), and (2) that competitive results can be obtained from as early as the first system dialogue act, reducing the need to wait for subsequent system dialogue acts.
  • 2013. In Proceedings of the 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGdial). Metz, France.

Proceedings of the Young Researcher’s Roundtable on Spoken Dialogue Systems.

  • El Asri, L.
  • Dethlefs, N.
  • Henderson, M.
  • Kennington, C.
  • Mitchell, C.
  • Schütte, N.
  • Villalba, M.
  • Baheux, D.
  • PDF
    Hide/Show Full Abstract We are delighted to welcome you to the Ninth Young Researchers’ Roundtable on Spoken Dialogue Systems in Metz, France. YRRSDS is a yearly event that began in 2005 in Lisbon, followed by Pittsburgh, Antwerp, Columbus, London, Tokyo, Portland, and Seoul. The aim of the workshop is to promote the networking of students, post docs, and junior researchers working in research related to spoken dialogue systems in academia and industry. The workshop provides an open forum where participants can discuss their research interests, current work, and future plans.
  • 2013. Co-located with the 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGdial). Metz, France.

Proceedings of the Second Workshop on Machine Learning for Interactive Systems (MLIS‘2013): Bridging the Gap Between Perception, Action and Communication.

  • Cuayáhuitl, H.
  • Frommberger, L.
  • Dethlefs, N.
  • van Otterlo, M.
  • Link to proceedings
    Hide/Show Full Abstract Intelligent systems or robots that interact with their environment by perceiving, acting or communicating often face a challenge in how to bring these different concepts together. One of the main reasons for this challenge is the fact that the core concepts in perception, action and communication are typically studied by different communities: the computer vision, robotics and natural language processing communities, among others, without much interchange between them. Learning systems that encompass perception, action and communication in a unified and principled way are still rare. As machine learning lies at the core of these communities, it can act as a unifying factor in bringing the communities closer together. Unifying these communities is highly important for understanding how state-of-the-art approaches from different disciplines can be combined (and applied) to form generally interactive intelligent systems. MLIS-2013 aims to bring researchers from multiple disciplines together that are in some way or another affected by the gap between perception, action and communication. Our goal is to provide a forum for interdisciplinary discussion that allows researchers to look at their work from new perspectives that go beyond their core community and develop new interdisciplinary collaborations.
  • 2013. Co-located with the 23rd International Joint Conference on Artificial Intelligence (IJCAI). Beijing, China.

Machine Learning for Interactive Systems and Robots: A Brief Introduction.

  • Cuayáhuitl, H.
  • van Otterlo, M.
  • Dethlefs, N.
  • Frommberger, L.
  • PDF
    Hide/Show Full Abstract Research on interactive systems and robots, i.e. interactive machines that perceive, act and communicate, has applied a multitude of different machine learning frameworks in recent years, many of which are based on a form of reinforcement learning (RL). In this paper, we will provide a brief introduction to the application of machine learning techniques in interactive learning systems. We identify several dimensions along which interactive learning systems can be analyzed. We argue that while many applications of interactive machines seem different at first sight, sufficient commonalities exist in terms of the challenges faced. By identifying these commonalities between (learning) approaches, and by taking interdisciplinary approaches towards the challenges, we anticipate more effective design and development of sophisticated machines that perceive, act and communicate in complex, dynamic and uncertain environments.
  • 2013. In Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems (MLIS-2013): Bridging the Gap between Perception, Action and Communication (MLIS-2013). ACM International Conference Proceedings Series, 2013. Co-located with IJCAI. Beijing, China.

Barge-in Effects in Bayesian Dialogue Act Recognition and Simulation.

    Hide/Show Full Abstract Dialogue act recognition and simulation are traditionally considered separate processes. Here, we argue that both can be fruitfully treated as interleaved processes within the same probabilistic model, leading to a synchronous improvement of performance in both. To demonstrate this, we train multiple Bayes Nets that predict the timing and content of the next user utterance. A specific focus is on providing support for barge-ins. We describe experiments using the Let’s Go data that show an improvement in classification accuracy (+5%) in Bayesian dialogue act recognition involving barge-ins using partial context compared to using full context. Our results also indicate that simulated dialogues with user barge-in are more realistic than simulations without barge-in events.
  • 2013. In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Olomouc, Czech Republic.

Demonstration of the PARLANCE System: A Data-Driven, Incremental, Spoken Dialogue System for Interactive Search.

  • Hastie, H.
  • Aufaure, M.
  • Alexopoulos, P.
  • Cuayáhuitl, H.
  • Dethlefs, N.
  • Gasic, M.
  • Henderson, J.
  • Lemon, O.
  • Liu, X.
  • Mika, P.
  • Mustapha, N.
  • Rieser, V.
  • Thomson, B.
  • Tsiakoulis, P.
  • Vanrompay, Y.
  • Villazon-Terrazas, B.
  • Young, S.
  • PDF
    Hide/Show Full Abstract The Parlance system for interactive search processes dialogue at a micro-turn level, displaying dialogue phenomena that play a vital role in human spoken conversation. These dialogue phenomena include more natural turn-taking through rapid system responses, generation of backchannels, and user barge-ins. The Parlance demonstration system differentiates from other incremental systems in that it is data-driven with an infrastructure that scales well.
  • 2013. In Proceedings of the 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGdial). Metz, France.

Optimising Incremental Dialogue Decisions Using Information Density for Interactive Systems.

    Hide/Show Full Abstract Abstract Incremental processing allows system designers to address several discourse phenomena that have previously been somewhat neglected in interactive systems, such as backchannels or barge-ins, but that can enhance the responsiveness and naturalness of systems. Unfortunately, prior work has focused largely on deterministic incremental decision making, rendering system behaviour less flexible and adaptive than is desirable. We present a novel approach to incremental decision making that is based on Hierarchical Reinforcement Learning to achieve an interactive optimisation of Information Presentation (IP) strategies, allowing the system to generate and comprehend backchannels and barge-ins, by employing the recent psycholinguistic hypothesis of information density (ID) (Jaeger, 2010). Results in terms of average rewards and a human rating study show that our learnt strategy outperforms several baselines that are not sensitive to ID by more than 23%.
  • 2012. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-CoNLL). Jeju, South Korea.

Optimising Incremental Generation for Spoken Dialogue Systems: Reducing the Need for Fillers.

    Hide/Show Full Abstract Recent studies have shown that incremental systems are perceived as more reactive, natural, and easier to use than non-incremental systems. However, previous work on incremental NLG has not employed recent advances in statistical optimisation using machine learning. This paper combines the two approaches, showing how the update, revoke and purge operations typically used in incremental approaches can be implemented as state transitions in a Markov Decision Process. We design a model of incremental NLG that generates output based on micro-turn interpretations of the user’s utterances and is able to optimise its decisions using statistical machine learning. We present a proof-of-concept study in the domain of Information Presentation (IP), where a learning agent faces the trade-off of whether to present information as soon as it is available (for high reactiveness) or else to wait until input ASR hypotheses are more reliable. Results show that the agent learns to avoid long waiting times, fillers and self-corrections, by re-ordering content based on its confidence.
  • 2012. In Proceedings of the 7th International Conference on Natural Language Generation (INLG). Chicago, IL, USA.

Hierarchical Dialogue Policy Learning Using Flexible State Transitions and Linear Function Approximation.

    Hide/Show Full Abstract Conversational agents that use reinforcement learning for policy optimization in large domains often face the problem of limited scalability. This problem can be addressed either by using function approximation techniques that estimate an approximate true value function, or by using a hierarchical decomposition of a learning task into subtasks. In this paper, we present a novel approach for dialogue policy optimization that combines the benefits of hierarchical control with function approximation. The approach incorporates two concepts to allow flexible switching between sub-dialogues, extending current hierarchical reinforcement learning methods. First, hierarchical tree-based state representations initially represent a compact portion of the possible state space and are then dynamically extended in real time. Second, we allow state transitions across sub-dialogues to allow non-strict hierarchical control. Our approach is integrated, and tested with real users, in a robot dialogue system that learns to play Quiz games.
  • 2012. In Proceedings of the 24th International Conference on Computational Linguistics (COLING). System Demonstrations. Mumbai, India.

Comparing HMMs and Bayesian Networks for Surface Realisation.

    Hide/Show Full Abstract Natural Language Generation (NLG) systems often use a pipeline architecture for sequential decision making. Recent studies however have shown that treating NLG decisions jointly rather than in isolation can improve the overall performance of systems. We present a joint learning framework based on Hierarchical Reinforcement Learning (HRL) which uses graphical models for surface realisation. Our focus will be on a comparison of Bayesian Networks and HMMs in terms of user satisfaction and naturalness. While the former perform best in isolation, the latter present a scalable alternative within joint systems.
  • 2012. In Proceedings of the 12th Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT). Montréal, Canada.

Hierarchical Multiagent Reinforcement Learning for Coordinating Verbal and Nonverbal Actions in Robots.

    Hide/Show Full Abstract This paper proposes an approach for learning to coordinate verbal and non-verbal behaviours in interactive robots. It is based on a hierarchy of multiagent reinforcement learners executing verbal and non-verbal actions in parallel. Our approach is evaluated in a conversational humanoid robot that learns to play Quiz games. First experimental results show evidence that the proposed multiagent approach can outperform hand-coded coordinated behaviours.
  • 2012. In Proceedings of the 1st Workshop on Machine Learning for Interactive Systems (MLIS’2012): Bridging the Gap Between Language, Motor Control and Vision. Co-located with ECAI. Montpellier, France.

Towards Optimising Modality Allocation for Multimodal Output Generation in Incremental Dialogue.

    Hide/Show Full Abstract Recent work on incremental processing in interactive systems has demonstrated that incremental systems can gain higher responsiveness and naturalness than their non-incremental counter-parts and are better perceived by human users. This paper presents a first investigation, based on a proof-of-concept study, into how multimodal information presentation in incremental dialogue systems can contribute towards more efficient and smooth interactions. In particular, we focus on how a combination of verbal and non-verbal output generation can help to reduce the need for self-corrections in a sys- tem that has to deal with continuous updates of input hypotheses. We suggest to use Reinforcement Learning to optimise the multimodal output allocation of a system, i.e. the idea that for every context, there is a combination of modalities which adequately communicates the communicative goal.
  • 2012. In Proceedings of the 1st Workshop on Machine Learning for Interactive Systems (MLIS’2012): Bridging the Gap Between Language, Motor Control and Vision. Co-located with ECAI. Montpellier, France.

Proceedings of the First Workshop on Machine Learning for Interactive Systems (MLIS’2012): Bridging the Gap Between Language, Motor Control and Vision.

  • Cuayáhuitl, H.
  • Frommberger, L.
  • Dethlefs, N.
  • Sahli, H.
  • PDF
    Hide/Show Full Abstract Intelligent interactive agents that are able to communicate with the world through more than one channel of communication face a number of research questions, for example: how to coordinate them in an effective manner? This is especially important given that perception, action and interaction can often be seen as mutually related disciplines that affect each other. We believe that machine learning plays and will keep playing an important role in interactive systems. Machine Learning provides an attractive and comprehensive set of computer algorithms for making interactive systems more adaptive to users and the environment and has been a central part of research in the disciplines of interaction, motor control and computer vision in recent years. This workshop aims to bring researchers together that have an interest in more than one of these disciplines and who have explored frameworks which can offer a more unified perspective on the capabilities of sensing, acting and interacting in intelligent systems and robots.
  • 2012. Co-located with the 20th European Conference on Artificial Intelligence (ECAI). Montpellier, France.

Dialogue Systems Using Online Learning: Beyond Empirical Methods.

    Hide/Show Full Abstract We discuss a change of perspective for training dialogue systems, which requires a shift from traditional empirical methods to online learning methods. We motivate the application of online learning, which provides the benefit of improving the system’s behaviour continuously often after each turn or dialogue rather than after hundreds of dialogues. We describe the requirements and advances for dialogue systems with online learning, and speculate on the future of these kinds of systems.
  • 2012. In Proceedings of the Workshop on Future Directions and Needs in the Spoken Dialogue Community: Tools and Data (SDCTD). Co-located with NAACL-HLT. Montréal, Canada.

Incremental Spoken Dialogue Systems: Tools and Data.

    Hide/Show Full Abstract Strict-turn taking models of dialogue do not accurately model human incremental processing, where users can process partial input and plan partial utterances in parallel. We discuss the current state of the art in incremental systems and propose tools and data required for further advances in the field of Incremental Spoken Dialogue Systems.
  • 2012. In Proceedings of the Workshop on Future Directions and Needs in the Spoken Dialogue Community: Tools and Data (SDCTD). Co-located with NAACL-HLT. Montréal, Canada.

Optimising Incremental Generation for Information Presentation of Mobile Search Results.

    Hide/Show Full Abstract This abstract discusses a proof-of-concept study in incremental Natural Language Generation (NLG) in the domain of Information Presentation for Spoken Dialogue Systems. The work presented is part of the FP7 EC Parlance project (http://www.parlance- project.eu). The goal of Parlance is to develop personalised, mobile, interactive, hyper-local search through speech. Recent trends in Information Retrieval are towards incremental, interactive search and we argue that spoken dialogue systems can provide a truly natural medium for this type of interactive search. This is particularly attractive for people on the move, who have their hands and eyes busy.
  • 2012. Presentation at Symposium: Influencing People with Information (SIPI). Aberdeen, Scotland.

Spatially-Aware Dialogue Control Using Hierarchical Reinforcement Learning.

    Hide/Show Full Abstract This article addresses the problem of scalable optimization for spatially-aware dialogue systems. These kinds of systems must perceive, reason, and act about the spatial environment where they are embedded. We formulate the problem in terms of Semi-Markov Decision Processes and propose a hierarchical reinforcement learning approach to optimize subbehaviors rather than full behaviors. Because of the vast number of policies that are required to control the interaction in a dynamic environment (e.g., a dialogue system assisting a user to navigate in a building from one location to another), our learning approach is based on two stages: (a) the first stage learns low-level behavior, in advance; and (b) the second stage learns high-level behavior, in real time. For such a purpose we extend an existing algorithm in the literature of reinforcement learning in order to support reusable policies and therefore to perform fast learning. We argue that our learning approach makes the problem feasible, and we report on a novel reinforcement learning dialogue system that performs a joint optimization between dialogue and spatial behaviors. Our experiments, using simulated and real environments, are based on a text-based dialogue system for indoor navigation. Experimental results in a realistic environment reported an overall user satisfaction result of 89%, which suggests that our proposed approach is attractive for its application in real interactions as it combines fast learning with adaptive and reasonable behavior.
  • 2011. ACM Transactions on Speech and Language Processing (Special Issue on Machine Learning for Robust and Adaptive Spoken Dialogue Systems). Vol. 7, No. 3, pp. 1-26.

Generation of Adaptive Route Descriptions in Urban Environments.

    Hide/Show Full Abstract This paper addresses the automatic generation of adaptive and cognitively adequate verbal route descriptions. Current automatic route descriptions suffer from a lack of adaptivity to the principles people employ in wayfinding communication, as well as to particular users’ information needs. We enhance adaptivity and cognitive adequacy by supplementing verbal route descriptions with salient geographic features, applying natural language generation techniques for linguistic realization. We also take users’ familiarity with an area into account. We present an architecture for navigational assistance operating on human cognitive and linguistic principles and report an evaluative user study that confirms the usefulness of our approach.
  • 2011. Spatial Cognition and Computation. Vol. 11, No. 2, pp. 153-177.

Hierarchical Reinforcement Learning and Hidden Markov Models for Task-Oriented Natural Language Generation.

    Hide/Show Full Abstract Surface realisation decisions in language generation can be sensitive to a language model, but also to decisions of content selection. We therefore propose the joint optimisation of content selection and surface realisation using Hierarchical Reinforcement Learning (HRL). To this end, we suggest a novel reward function that is induced from human data and is especially suited for surface realisation. It is based on a generation space in the form of a Hidden Markov Model (HMM). Results in terms of task success and human-likeness suggest that our unified approach performs better than greedy or random baselines.
  • 2011. In Proceedings of the 49th Annual Conference of the Association for Computational Linguistics (ACL-HLT). Short Papers. Portland, OR, USA.

Optimizing Situated Dialogue Management in Unknown Environments.

    Hide/Show Full Abstract We present a conversational learning agent that helps users navigate through complex and challenging spatial environments. The agent exhibits adaptive behaviour by learning spatially-aware dialogue actions while the user carries out the navigation task. To this end, we use Hierarchical Reinforcement Learning with relational representations to efficiently optimize dialogue actions tightly-coupled with spatial ones, and Bayesian networks to model the user’s beliefs of the navigation environment. Since these beliefs are continuously changing, we induce the agent’s behaviour in real time. Experimental results, using simulation, are encouraging by showing efficient adaptation to the user’s navigation knowledge, specifically to the generated route and the intermediate locations to negotiate with the user.
  • 2011. In Proceedings of INTERSPEECH. Florence, Italy.

Optimising Natural Language Generation Decision Making for Situated Dialogue.

    Hide/Show Full Abstract Natural language generators are faced with a multitude of different decisions during their generation process. We address the joint optimisation of navigation strategies and referring expressions in a situated setting with respect to task success and human-likeness. To this end, we present a novel, comprehensive framework that combines supervised learning, Hierarchical Reinforcement Learning and a hierarchical Information State. A human evaluation shows that our learnt instructions are rated similar to human instructions, and significantly better than the supervised learning baseline.
  • 2011. In Proceedings of the 12th Annual Meeting on Discourse and Dialogue (SIGdial). Portland, OR, USA.

Combining Hierarchical Reinforcement Learning and Bayesian Networks for Natural Language Generation.

    Hide/Show Full Abstract Language generators in situated domains face a number of content selection, utterance planning and surface realisation decisions, which can be strictly interdependent. We therefore propose to optimise these processes in a joint fashion using Hierarchical Reinforcement Learning. To this end, we induce a reward function for content selection and utterance planning from data using the PARADISE framework, and suggest a novel method for inducing a reward function for surface realisation from corpora. It is based on generation spaces represented as Bayesian Networks. Results in terms of task success and human-likeness suggest that our unified approach performs better than a baseline optimised in isolation or a greedy or random baseline. It receives human ratings close to human authors.
  • 2011. In Proceedings of the 13th European Workshop on Natural Language Generation (ENLG). Nancy, France.

The Bremen System for the GIVE-2.5 Challenge.

    Hide/Show Full Abstract This paper presents the Bremen system for the GIVE-2.5 challenge. It is based on decision trees learnt from new annotations of the GIVE corpus augmented with manually specified rules. Surface realisation is based on context-free grammars. The paper will address advantages and shortcomings of the approach and discuss how the present system can serve as a baseline for a future evaluation with an improved version using hierarchical reinforcement learning with graphical models.
  • 2011. In Proceedings of the 13th European Workshop on Natural Language Generation (ENLG). Generation Challenges Session. Nancy, France.

Position Paper in the Young Researchers’ Roundtable on Spoken Dialogue Systems (YRRSDS).

    Hide/Show Full Abstract My research interests involve context-sensitive, or adaptive, Natural Language Generation (NLG) for situated dialogue systems, especially for spoken interaction. Context-sensitive situated dialogue systems are typically required to adapt flexibly to dynamic changes of (a) properties of the situation or the spatial setting, such as visible objects, or the complexity of the environment, (b) properties of the user, such as their prior knowledge, goals, beliefs, and general information need, and (c) the dialogue history. In this context, I am mainly interested in applying Reinforcement Learning (RL) with hierarchical control and prior knowledge in several contexts of rather large-scale systems for complex domains. I have also recently looked into the joint optimisation of different system behaviours for interdependent decision making between them.
  • 2011. Portland, OR, USA.

Hierarchical Reinforcement Learning for Adaptive Text Generation.

    Hide/Show Full Abstract We present a novel approach to natural language generation (NLG) that applies hierarchical reinforcement learning to text generation in the wayfinding domain. Our approach aims to optimise the integration of NLG tasks that are inherently different in nature, such as decisions of content selection, text structure, user modelling, referring expression generation (REG), and surface realisation. It also aims to capture existing interdependencies between these areas. We apply hierarchical reinforcement learning to learn a generation policy that captures these interdependencies, and that can be transferred to other NLG tasks. Our experimental results—in a simulated environment—show that the learnt wayfinding policy outperforms a baseline policy that takes reasonable actions but without optimization.
  • 2010. In Proceedings of the 6th International Conference on Natural Language Generation (INLG). Dublin, Ireland.

Route Instructions in Map-Based and Human-Based Dialogue: A Comparative Analysis.

  • Tenbrink, T.
  • Ross, R.
  • Thomas, K.
  • Dethlefs, N.
  • Andonova, E.
  • Link to article
    Hide/Show Full Abstract When conveying information about spatial situations and goals, speakers adapt flexibly to their addressee in order to reach the communicative goal efficiently and effortlessly. Our aim is to equip a dialogue system with the abilities required for such a natural, adaptive dialogue. In this paper we investigate the strategies people use to convey route information in relation to a map by presenting two parallel studies involving human–human and human–computer interaction. We compare the instructions given to a human interaction partner with those given to a dialogue system which reacts by basic verbal responses and dynamic visualization of the route in the map. The language produced by human route givers is analyzed with respect to a range of communicative as well as cognitively crucial features, particularly perspective choice and references to locations across levels of granularity. Results reveal that speakers produce systematically different instructions with respect to these features, depending on the nature of the interaction partner, human or dialogue system. Our further analysis of clarification and reference resolution strategies produced by human route followers provides insights into dialogue strategies that future systems should be equipped with.
  • 2010. Journal of Visual Languages and Computing. Vol. 21, No. 5, pp. 292-309.

Evaluating Task Success in a Dialogue System for Indoor Navigation

  • Dethlefs, N.
  • Cuayáhuitl, H.
  • Richter, K.-F.
  • Andonova, E.
  • Bateman, J.
  • PDF
    Hide/Show Full Abstract In this paper we address the assessment of dialogue systems for indoor wayfinding. Based on the PARADISE evaluation framework we propose and evaluate several task success metrics for such a purpose. According to correlation and multiple linear regression analyses, we found that task success metrics that penalise difficulty in wayfinding are more informative of system performance than a success/failure binary task success metric.
  • 2010. In Proceedings of the 14th Workshop on the Semantics and Pragmatics of Dialogue (SemDial-PozDial). Poznan, Poland.

Generating Adaptive Route Instructions Using Hierarchical Reinforcement Learning

  • Cuayáhuitl, H.
  • Dethlefs, N.
  • Frommberger, L.
  • Richter, K.-F.
  • Bateman, J.
  • PDF
    Hide/Show Full Abstract We present a learning approach for efficiently inducing adaptive behaviour of route instructions. For such a purpose we propose a two-stage approach to learn a hierarchy of wayfinding strategies using hierarchical reinforcement learning. Whilst the first stage learns low-level behaviour, the second stage focuses on learning high-level behaviour. In our proposed approach, only the latter is to be applied at runtime in user-machine interactions. Our experiments are based on an indoor navigation scenario for a building that is complex to navigate. We compared our approach with flat reinforcement learning and a fully-learnt hierarchical approach. Our experimental results show that our proposed approach learns significantly faster than the baseline approaches. In addition, the learnt behaviour shows to adapt to the type of user and structure of the spatial environment. This approach is attractive to automatic route giving since it combines fast learning with adaptive behaviour.
  • 2010. In Proceedings of the 7th International Conference on Spatial Cognition **(Spatial Cognition VII)**. Portland, OR, USA.

The Dublin-Bremen System for the GIVE2-Challenge

    Hide/Show Full Abstract This paper describes the Dublin-Bremen GIVE-2 generation system. Our main approach focused on abstracting over the low-level behaviour of the baseline agent and guide the user by more high-level navigation information. For this purpose, we provided the user with (a) high-level action commands, (b) lookahead information, and (c) a “patience” period after they left the intended path to allow exploration. We describe a number of problems that our system encountered during the evaluation due to some of our initial assumptions not holding, and address several means by which we could achieve better performance in the future.
  • 2010. Poster presentation at the 6th International Conference on Natural Language Generation (INLG). Dublin, Ireland.

A Dialogue System for Indoor Wayfinding Using Text-Based Natural Language

  • Cuayáhuitl, H.
  • Dethlefs, N.
  • Richter, K.-F.
  • Tenbrink, T.
  • Bateman, J.
  • PDF
    Hide/Show Full Abstract We present a dialogue system that automatically generates indoor route instructions in German when asked about locations, using text-based natural language input and output. The challenging task in this system is to provide the user with a compact set of accurate and comprehensible instructions. We describe our approach based on high-level instructions. The system is described with four main modules: natural language understanding, dialogue management, route instruction generation and natural language generation. We report an evaluation with users unfamiliar with the system — using the PARADISE evaluation framework — in a real environment and naturalistic setting. We present results with high user satisfaction, and discuss future directions for enhancing this kind of system with more sophisticated and intuitive interaction.
  • 2010. International Journal of Computational Linguistics and Applications. Vol. 1, No. 2, pp. 285-304. Posted presented at the 11th Conference on Intelligent Text Processing and Computational Linguistics (CICLing).