Natural Language Processing Publications

One-Vs-Rest Neural Network English Grapheme Segmentation: A Linguistic Perspective.

    Hide/Show Full Abstract Grapheme-to-Phoneme (G2P) correspondences form foundational frameworks of tasks such as text-to-speech (TTS) synthesis or automatic speech recognition. The G2P process involves taking words in their written form and generating their pronunciation. In this paper, we critique the status quo definition of a grapheme, currently a forced alignment process relating a single character to either a phoneme or a blank unit, that underlies the majority of modern approaches. We develop a linguisticallymotivated redefinition from simple concepts such as vowel and consonant count and word length and offer a proof-of-concept implementation based on a multi-binary neural classification task. Our model achieves competitive results with a 31.86% Word Error Rate on a standard benchmark, while generating linguistically meaningful grapheme segmentations.
  • Proceedings of the 28th Conference on Computational Natural Language Learning (CoNLL)., Miami, USA.

Understanding Slang with LLMs: Modelling Cross-Cultural Nuances through Paraphrasing.

    Hide/Show Full Abstract In the realm of social media discourse, the integration of slang enriches communication, reflecting the sociocultural identities of users. This study investigates the capability of large language models (LLMs) to paraphrase slang within climate-related tweets from Nigeria and the UK, with a focus on identifying emotional nuances. Using DistilRoBERTa as the baseline model, we observe its limited comprehension of slang. To improve cross-cultural understanding, we gauge the effectiveness of leading LLMs: ChatGPT 4, Gemini, and LLaMA3 in slang paraphrasing. While ChatGPT 4 and Gemini demonstrate comparable effectiveness in slang paraphrasing, LLaMA3 shows less coverage, with all LLMs exhibiting limitations in coverage, especially of Nigerian slang. Our findings underscore the necessity for culturally-sensitive LLM development in emotion classification, particularly in non-anglocentric regions.
  • Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), Miami, USA.

Using Large Language Models to Recommend Repair Actions for Offshore Wind Maintenance.

    Hide/Show Full Abstract The Offshore Wind (OSW) industry is experiencing significant expansion, resulting in increased Operations & Maintenance (O&M) costs. Intelligent alarm systems offer the prospect of swift detection of component failures and process anomalies, enabling timely and precise interventions that could yield reductions in resource expenditure, as well as scheduled and unscheduled downtime. This paper introduces an innovative approach to tackle this challenge by capitalising on Large Language Models (LLMs). We present a specialised conversational agent that incorporates statistical techniques to calculate distances between sentences for the detection and filtering of hallucinations and unsafe output. This potentially enables improved interpretation of alarm sequences and the generation of safer repair action recommendations by the agent. Preliminary findings are presented with the approach applied to ChatGPT-4 generated test sentences. The limitation of using ChatGPT-4 and the potential for enhancement of this agent through re-training with specialised OSW datasets are discussed.
  • 2024 J. Phys.: Conf. Ser. 2875 012025.

SafeLLM: Domain-Specific Safety Monitoring for Large language Models: A Case Study for Offshore Wind Maintenance.

    Hide/Show Full Abstract The Offshore Wind (OSW) industry is experiencing significant expansion, resulting in increased Operations & Maintenance (O&M) costs. Intelligent alarm systems offer the prospect of swift detection of component failures and process anomalies, enabling timely and precise interventions that could yield reductions in resource expenditure, as well as scheduled and unscheduled downtime. This paper introduces an innovative approach to tackle this challenge by capitalising on Large Language Models (LLMs). We present a specialised conversational agent that incorporates statistical techniques to calculate distances between sentences for the detection and filtering of hallucinations and unsafe output. This potentially enables improved interpretation of alarm sequences and the generation of safer repair action recommendations by the agent. Preliminary findings are presented with the approach applied to ChatGPT-4 generated test sentences. The limitation of using ChatGPT-4 and the potential for enhancement of this agent through re-training with specialised OSW datasets are discussed.
  • 2024 Preprint.

Understanding Slang with LLMs: Modelling Cross-Cultural Nuances through Paraphrasing.

    Hide/Show Full Abstract In the realm of social media discourse, the integration of slang enriches communication, reflecting the sociocultural identities of users. This study investigates the capability of large language models (LLMs) to paraphrase slang within climate-related tweets from Nigeria and the UK, with a focus on identifying emotional nuances. Using DistilRoBERTa as the baseline model, we observe its limited comprehension of slang. To improve cross-cultural understanding, we gauge the effectiveness of leading LLMs ChatGPT 4, Gemini, and LLaMA3 in slang paraphrasing. While ChatGPT 4 and Gemini demonstrate comparable effectiveness in slang paraphrasing, LLaMA3 shows less coverage, with all LLMs exhibiting limitations in coverage, especially of Nigerian slang. Our findings underscore the necessity for culturally-sensitive LLM development in emotion classification, particularly in non-anglocentric regions.
  • 2024 Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Miami, Florida, US.

One-Vs-Rest Neural Network English Grapheme Segmentation: A Linguistic Perspective.

    Hide/Show Full Abstract Grapheme-to-Phoneme (G2P) correspondences form foundational frameworks of tasks such as text-to-speech (TTS) synthesis or automatic speech recognition. The G2P process involves taking words in their written form and generating their pronunciation. In this paper, we critique the status quo definition of grapheme, currently a forced alignment process relating a single character to either a phoneme or a blank unit, that underlies the majority of modern approaches. We develop a linguistically-motivated redefinition from simple concepts such as vowel and consonant count and word length and offer a proof-of-concept implementation based on a multi-binary neural classification task. Our model achieves state-of-the-art results with a 31.86% Word Error Rate on a standard benchmark, while generating linguistically meaningful grapheme segmentations.
  • 2024 Proceedings of the SIGNLL Conference on Computational Natural Language Learning (CoNLL), Miami, Florida, US.

Safety Monitoring for Large Language Models: A Case Study of Offshore Wind Maintenance .

    Hide/Show Full Abstract It has been forecasted that a quarter of the world’s energy usage will be supplied from Offshore Wind (OSW) by 2050 (Smith 2023). Given that up to one third of Levelised Cost of Energy (LCOE) arises from Operations and Maintenance (O&M), the motive for cost reduction is enormous. In typical OSW farms hundreds of alarms occur within a single day, making manual O&M planning without automated systems costly and difficult. Increased pressure to ensure safety and high reliability in progressively harsher environments motivates the exploration of Artificial Intelligence (AI) and Machine Learning (ML) systems as aids to the task. We recently introduced a specialised conversational agent trained to interpret alarm sequences from Supervisory Control and Data Acquisition (SCADA) and recommend comprehensible repair actions (Walker et al. 2023). Building on recent advancements on Large Language Models (LLMs), we expand on this earlier work, fine tuning LLAMA (Touvron 2018), using available maintenance records from EDF Energy. An issue presented by LLMs is the risk of responses containing unsafe actions, or irrelevant hallucinated procedures. This paper proposes a novel framework for safety monitoring of OSW, combining previous work with additional safety layers. Generated responses of this agent are being filtered to prevent raw responses endangering personnel and the environment. The algorithm represents such responses in embedding space to quantify dissimilarity to pre-defined unsafe concepts using the Empirical Cumulative Distribution Function (ECDF). A second layer identifies hallucination in responses by exploiting probability distributions to analyse against stochastically generated sentences. Combining these layers, the approach finetunes individual safety thresholds based on categorised concepts, providing a unique safety filter. The proposed framework has potential to utilise the O&M planning for OSW farms using state-of-the-art LLMs as well as equipping them with safety monitoring that can increase technology acceptance within the industry.
  • 2024 Proc. of the Safety Critical Systems Symposium SSS'24, Bristol, UK

User Engagement Triggers in Social Media Discourse on Biodiversity Conservation.

    Hide/Show Full Abstract Studies in digital conservation have increasingly used social media in recent years as a source of data to understand the interactions between humans and nature, model and monitor biodiversity, and analyse online discourse about the conservation of species. Current approaches to digital conservation are for the most part purely frequentist, i.e. focused on easily trackable and quantiiable features, or purely qualitative, which allows a deeper level of interpretation, but is less scalable. Our approach aims to evaluate the applicability of recent advances in deep learning in combination with semi-automatic analysis. We present a multimodal neural learning framework that experiments with diferent combinations of linguistic and visual features and metadata of tweets to predict user engagement from a function of likes and retweets. Experimental results show that text is the single most efective modality for prediction when a large amount of training data is available. For smaller datasets, drawing information from multiple modalities can boost performance. Notably, we ind a negative efect of large pre-trained language models when dealing with substantially unbalanced datasets. A qualitative analysis into the triggers of user engagement with tweets reveals that it emerges from a combination of online discourse topic and sentiment, and is often ampliied by user activity, e.g. when content originates from an inluencer account. We ind clear evidence of existing sub-communities around speciic topics, including animal photography and sightings, illegal wildlife trade and trophy hunting, deforestation and destruction of nature and climate change and action in a broader sense.
  • 2024 ACM Transactions on Social Computing

BDA at SemEval-2024 Task 4: Detection of Persuasion in Memes Across Languages with Ensemble Learning and External Knowledge.

    Hide/Show Full Abstract This paper outlines our multimodal ensemble learning system for identifying persuasion tech- niques in memes. We contribute an approach which utilises the novel inclusion of consistent named visual entities extracted using Google Vision API’s as an external knowledge source, joined to our multimodal ensemble via late fu- sion. As well as detailing our experiments in ensemble combinations, fusion methods and data augmentation, we explore the impact of including external data and summarise post- evaluation improvements to our architecture based on analysis of the task results.
  • 2024 SEMEVAL 2024 Shared Task on "Multilingual Detection of Persuasion Techniques in Memes", at the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Mexico City, Mexico

Towards Interactive Anomaly Detection using Natural Language.

    Hide/Show Full Abstract When training models for visual anomaly detection, typically, a dataset is collected and then annotated offline. Even if collecting raw data is relatively cheap, annotations are expensive, especially if they require human expertise. We therefore propose a novel interactive learning framework that combines active learning with natural language interaction to minimise the amount of annotated training data and allow for refined human expert feedback that may be leveraged in the learning pro- cess. In our initial experiments on wind turbine drone images, we demonstrate the effectiveness of active learning for anomaly detection when using ground truth la- bels, and assess the impact on learning when collecting labels from ‘experts’ versus ‘non-experts’ using our dialogue system. In addition to anomaly labels with confi- dence scores, we collect and analyse natural language explanations, which may be used to improve both anomaly detection performance and explainability.
  • 2024 The 14th International Workshop on Spoken Dialogue Systems Technology, Sapporo, Japan

Linguistic Pattern Analysis in the Climate Change-Related Tweets from UK and Nigeria.

    Hide/Show Full Abstract To understand the global trends of human opinion on climate change in specific geographical areas, this research proposes a framework to analyse linguistic features and cultural differences in climate-related tweets. Our study combines transformer networks with linguistic feature analysis to address small dataset limitations and gain insights into cultural differences in tweets from the UK and Nigeria. Our study found that Nigerians use more leadership language and informal words in discussing climate change on Twitter compared to the UK, as these topics are treated as an issue of salience and urgency. In contrast, the UK’s discourse about climate change on Twitter is characterised by using more formal, logical, and longer words per sentence compared to Nigeria. Also, we confirm the geographical identifiability of tweets through a classification task using DistilBERT, which achieves 83% of accuracy.
  • 2023 Proceedings of the CLASP Conference on Learning with Small Data (LSD), Gothenburg, Sweden

Real-time social media sentiment analysis for rapid impact assessment of floods.

    Hide/Show Full Abstract Traditional approaches to flood modelling mostly rely on hydrodynamic physical simulations. While these simulations can be accurate, they are computationally expensive and prohibitively so when thinking about real-time prediction based on dynamic environmental conditions. Alternatively, social media platforms such as Twitter are often used by people to communicate during a flooding event, but discovering which tweets hold useful information is the key challenge in extracting information from posts in real time. In this article, we present a novel model for flood forecasting and monitoring that makes use of a transformer network that assesses the severity of a flooding situation based on sentiment analysis of the multimodal inputs (text and images). We also present an experimental comparison of a range of state-of-the-art deep learning methods for image processing and natural language processing. Finally, we demonstrate that information induced from tweets can be used effectively to visualise fine-grained geographical flood-related information dynamically and in real-time.
  • 2023 Computers & Geosciences

This new conversational AI model can be your friend, philosopher, and guide ... and even your worst enemy.

    Hide/Show Full Abstract We explore the recently released ChatGPT model, one of the most powerful conversational AI models that has ever been developed. This opinion provides a perspective on its strengths and weaknesses and a call to action for the AI community (including academic researchers and industry) to work together on preventing potential misuse of such powerful AI models in our everyday lives.
  • 2023 Patterns Volume 4, Issue 1, Opinion Article

RELATE: Generating a linguistically inspired Knowledge Graph for fine-grained emotion classification.

    Hide/Show Full Abstract Several existing resources are available for sentiment analysis (SA) tasks that are used for learning sentiment specific embedding (SSE) representations. These resources are either large, common-sense knowledge graphs (KG) that cover a limited amount of polarities/emotions or they are smaller in size, such as lexicons, which require costly human annotation and cover fine-grained emotions. Therefore using knowledge resources to learn SSE representations is either limited by the low coverage of polarities/emotions or the overall size of a resource. In this paper, we first introduce a new directed KG called ‘RELATE’, which is built to overcome both the issue of low coverage of emotions and the issue of scalability. RELATE is the first KG of its size to cover Ekman’s six basic emotions that are directed towards entities. It is based on linguistic rules to incorporate the benefit of semantics without relying on costly human annotation. The performance of ‘RELATE’ is evaluated by learning SSE representations using a Graph Convolutional Neural Network (GCN).
  • 2022 13th Language Resources and Evaluation Conference (LREC).

Towards Contextually Sensitive Analysis of Memes: Meme Genealogy and Knowledge Base.

    Hide/Show Full Abstract As online communication grows, memes have con- tinued to evolve and circulate as succinct multi- modal forms of communication. However, compu- tational approaches applied to meme-related lack the same depth and contextual sensitivity of non- computational approaches and struggle to interpret intra-modal dynamics and referentiality. This re- search proposes to a ‘meme genealogy’ of key fea- tures and relationships between memes to inform a knowledge base constructed from meme-specific online sources and embed connotative meaning and contextual information in memes. The proposed methods provide a basis to train contextually sensi- tive computational models for analysing memes and applications in automated meme annotation.
  • 2022 IJCAI Doctoral Consortium.

Using Multimodal Data and AI to Dynamically Map Flood Risks.

    Hide/Show Full Abstract Classical measurements and modelling that underpin present flood warning and alert systems are based on fixed and spa- tially restricted static sensor networks. Computationally ex- pensive physics-based simulations are often used that can’t react in real-time to changes in environmental conditions. We want to explore contemporary artificial intelligence (AI) for predicting flood risk in real time by using a diverse range of data sources. By combining heterogeneous data sources, we aim to nowcast rapidly changing flood conditions and gain a greater understanding of urgent humanitarian needs.
  • 2022 AAAI Doctoral Consortium (AAAI-DC).

Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue.

  • Haizhou Li, Gina-Anne Levow, Zhou Yu, Chitralekha Gupta, Berrak Sisman, Siqi Cai, David Vandyke, Nina Dethlefs, Yan Wu, Junyi Jessy
  • PDF
  • 2021 SIGDIAL.

A divide-and-conquer approach to neural natural language generation from structured data.

    Hide/Show Full Abstract Current approaches that generate text from linked data for complex real-world domains can face problems including rich and sparse vocabularies as well as learning from examples of long varied sequences. In this article, we propose a novel divide-and-conquer approach that automatically induces a hierarchy of “generation spaces” from a dataset of semantic concepts and texts. Generation spaces are based on a notion of similarity of partial knowledge graphs that represent the domain and feed into a hierarchy of sequence-to-sequence or memory-to-sequence learners for concept-to-text generation. An advantage of our approach is that learning models are exposed to the most relevant examples during training which can avoid bias towards majority samples. We evaluate our approach on two common benchmark datasets and compare our hierarchical approach against a flat learning setup. We also conduct a comparison between sequence-to-sequence and memory-to-sequence learning models. Experiments show that our hierarchical approach overcomes issues of data sparsity and learns robust lexico-syntactic patterns, consistently outperforming flat baselines and previous work by up to 30%. We also find that while memory-to-sequence models can outperform sequence-to-sequence models in some cases, the latter are generally more stable in their performance and represent a safer overall choice.
  • 2021. Neurocomputing 433, 300-309.

Hierarchical Multiscale Recurrent Neural Networks for Detecting Suicide Notes.

    Hide/Show Full Abstract Recent statistics in suicide prevention show that people are increasingly posting their last words online and with the unprecedented availability of textual data from social media platforms researchers have the opportunity to analyse such data. Furthermore, psychological studies have shown that our state of mind can manifest itself in the linguistic features we use to communicate. In this paper, we investigate whether it is possible to automatically identify suicide notes from other types of social media blogs in two document-level classification tasks. The first task aims to identify suicide notes from depressed and blog posts in a balanced dataset, whilst the second experiment looks at how well suicide notes can be classified when there is a vast amount of neutral text data, which makes the task more applicable to real-world scenarios. Furthermore we perform a linguistic analysis using LIWC (Linguistic Inquiry and Word Count). We present a learning model for modelling long sequences in two experiment series. We achieve an f1-score of 88.26% over the baselines of 0.60 in experiment 1 and 96.1% over the baseline in experiment 2. Finally, we show through visualisations which features the learning model identifies, these include emotions such as love and personal pronouns.
  • 2021. IEEE Transactions on Affective Computing.

A Dual Transformer Model for Intelligent Decision Support for Maintenance of Wind Turbines.

    Hide/Show Full Abstract Wind energy is one of the fastest-growing sustainable energy sources in the world but relies crucially on efficient and effective operations and maintenance to generate sufficient amounts of energy and reduce downtime of wind turbines and associated costs. Machine learning has been applied to fault prediction in wind turbines, but these predictions have not been supported with suggestions on how to avert and fix faults. We present a data-to-text generation system utilising transformers for generating corrective maintenance strategies for faults using SCADA data capturing the operational status of turbines. We achieve this in two stages: a first stage identifies faults based on SCADA input features and their relevance. A second stage performs content selection for the language generation task and creates maintenance strategies based on phrase-based natural language templates. Experiments show that our dual transformer model achieves an accuracy of up to 96.75% for alarm prediction and up to 75.35% for its choice of maintenance strategies during content-selection. A qualitative analysis shows that our generated maintenance strategies are promising. We make our human- authored maintenance templates publicly available, and include a brief video explaining our approach.
  • 2020 International Joint Conference on Neural Networks (IJCNN).

Natural Language Generation for Operations and Maintenance in Wind Turbines.

    Hide/Show Full Abstract Wind energy is one of the fastest-growing sustainable energy sources in the world but relies crucially on efficient and effective operations and maintenance to generate sufficient amounts of energy and reduce downtime of wind turbines and associated costs. Machine learning has been applied to fault prediction in wind turbines, but these predictions have not been supported with suggestions on how to avert and fix faults. We present a data-to-text generation system using transformers to produce event descriptions from SCADA data capturing the operational status of turbines and proposing maintenance strategies. Experiments show that our model learns feature representations that correspond to expert judgements. In making a contribution to the reliability of wind energy, we hope to encourage organisations to switch to sustainable energy sources and help combat climate change.
  • 2019. In NeurIPS 2019 Workshop on Tackling Climate Change with Machine Learning. Vancouver, Canada.

Hybrid approaches to fine-grained emotion detection in social media data.

    Hide/Show Full Abstract This paper states the challenges in fine-grained target- dependent Sentiment Analysis for social media data using recurrent neural networks. Firstly, we outline the problem statement and give a brief overview of related work in the area. Then we outline progress and results achieved to date, a brief research plan and future directions of this work.
  • To appear. In AAAI-2020 Doctoral Consortium. New York, USA.

Bidirectional Dilated LSTM with Attention for Fine-grained Emotion Classification in Tweets.

    Hide/Show Full Abstract We propose a novel approach for fine-grained emotion classification in tweets using a Bidirectional Dilated LSTM (BiDLSTM) with attention. Conventional LSTM architectures can face problems when classifying long sequences, which is problematic for tweets, where crucial information is often attached to the end of a sequence, e.g. an emoticon. We show that by adding a bidirectional layer, dilations and attention mechanism to a standard LSTM, our model overcomes these problems and is able to maintain complex data dependencies over time. We present experiments with two datasets, the 2018 WASSA Implicit Emotions Shared Task and a new dataset of 240,000 tweets. Our BiDLSTM with attention achieves a test accuracy of up to 81.97% outperforming competitive baselines by up to 10.52% on both datasets. Finally, we evaluate our data against a human benchmark on the same task.
  • To appear. In Proceedings of AAAI-2020 Workshop on Affective Content Analysis. New York, USA.

Dilated LSTM with ranked units for classification of suicide notes.

    Hide/Show Full Abstract Recent statistics in suicide prevention show that people are increasingly posting their last words online and with the unprecedented availability of textual data from social media platforms researchers have the opportunity to analyse such data. Furthermore, psychological studies have shown that our state of mind can manifest itself in the linguistic features we use to communicate. In this paper, we investigate whether it is possible to automatically identify suicide notes from other types of social media blogs in a document-level classification task. Also, we present a learning model for modelling long sequences, achieving an f1-score of 0.84 over the baselines of 0.53 and 0.80 (best competing model). Finally, we also show through visualisations which features the learning model identifies.
  • 2019. In AI for Social Good workshop at NeurIPS (2019), Vancouver, Canada.

Dilated LSTM with attention for Classification of suicide notes.

    Hide/Show Full Abstract In this paper we present a dilated LSTM with attention mechanism for document-level classification of suicide notes, last statements and depressed notes. We achieve an accuracy of 87.34% compared to competitive baselines of 80.35% (Logistic Model Tree) and 82.27% (Bi-directional LSTM with Attention). Furthermore, we provide an analysis of both the grammatical and thematic content of suicide notes, last statements and depressed notes. We find that the use of personal pronouns, cognitive processes and references to loved ones are most important. Finally, we show through visualisations of attention weights that the Dilated LSTM with attention is able to identify the same distinguishing features across documents as the linguistic analysis.
  • 2019. In Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019) at EMNLP. Hong Kong.

Cross-dialectal speech processing

    Hide/Show Full Abstract Despite advances in technology, language diversity remains a challenge to the speech processing community, but there is also an opportunity to rise to this challenge through research and innovation. Pluricentric languages play an important role in such work, particularly where these languages are better resourced. Dedicated researchers across several decades, have steadily contributed resources for some language varieties, increasing general availability of a range of data archives...
  • 2019. INTERSPEECH Satellite Workshop on Pluricentric Languages in Speech Technology, Graz, Austria.

Unsupervised suicide note classification.

    Hide/Show Full Abstract With the greater availability of linguistic data from public social media platforms and the advancements of natural language processing, a number of opportunities have arisen for researchers to analyse this type of data. Research efforts have mostly focused on detecting the polarity of textual data, evaluating whether there is positive, negative or sometimes neutral content. Especially the use of neural networks has recently yielded significant results in polarity detection experiments. In this paper we present a more fine-grained approach to detecting sentiment in textual data, particularly analysing a corpus of suicide notes, depressive notes and love notes. We achieve a classification accuracy of 71.76% when classifying based on text and sentiment features, and an accuracy of 69.41% when using the words present in the notes alone. We discover that while emotions in all three datasets overlap, each of them has a unique ‘emotion profile’ which allows us to draw conclusions about the potential mental state that is reflects. Using the emotion sequences only, we achieve an accuracy of 75.29%. The results from unannotated data, while worse than the other models, nevertheless represent an encouraging step towards being able to flag potentially harmful social media posts online and in real time. We provide a high-level corpus analysis of the data sets in order to demonstrate the grammatical and emotional differences.
  • 2018. In Proceedings of the 7th KDD Workshop on Issues of Sentiment Discovery and Opinion Mining (WISDOM), co-located with the Knowledge Discovery and Data Mining (KDD), London, UK.

Domain Transfer for Deep Natural Language Generation from Abstract Meaning Representations.

    Hide/Show Full Abstract Stochastic natural language generation systems that are trained from labelled datasets are often domain-specific in their annotation and in their mapping from semantic input representations to lexical-syntactic outputs. As a result, learnt models fail to generalize across domains, heavily restricting their usability beyond single applications. In this article, we focus on the problem of domain adaptation for natural language generation. We show how linguistic knowledge from a source domain, for which labelled data is available, can be adapted to a target domain by reusing training data across domains. As a key to this, we propose to employ abstract meaning representations as a common semantic representation across domains. We model natural language generation as a long short- term memory recurrent neural network encoder-decoder, in which one recurrent neural network learns a latent representation of a semantic input, and a second recurrent neural network learns to decode it to a sequence of words. We show that the learnt representations can be transferred across domains and can be leveraged effectively to improve training on new unseen domains. Experiments in three different domains and with six datasets demonstrate that the lexical-syntactic constructions learnt in one domain can be transferred to new domains and achieve up to 75-100% of the performance of in-domain training. This is based on objective metrics such as BLEU and semantic error rate and a subjective human rating study. Training a policy from prior knowledge from a different domain is consistently better than pure in-domain training by up to 10%.
  • 2017. IEEE Computational Intelligence Magazine: Special Issue on Natural Language Generation with Computational Intelligence.

Deep text generation - Using hierarchical decomposition to mitigate the effect of rare data points.

    Hide/Show Full Abstract Deep learning has recently been adopted for the task of natural language generation (NLG) and shown remarkable results. However, learning can go awry when the input dataset is too small or not well balanced with regards to the examples it contains for various input sequences. This is relevant to naturally occurring datasets such as many that were not prepared for the task of natural language processing but scraped off the web and originally prepared for a different purpose. As a mitigation to the problem of unbalanced training data, we therefore propose to decompose a large natural language dataset into several subsets that “talk about” the same thing. We show that the decomposition helps to focus each learner’s attention during training. Results from a proof-of-concept study show 73% times faster learning over a flat model and better results.
  • 2017. In Proceedings of Language, Data and Knowledge (LDK), Galway, Ireland. Proceedings in: Springer Lecture Notes in Computer Science (LNCS).

Natural language-based presentation of cognitive stimulation to people with dementia in assistive technology: a pilot study.

  • Dethlefs, N.
  • Milders, M.
  • Cuayáhuitl, H.
  • Al-Salkini, T.
  • Douglas, D.
  • PDF
    Hide/Show Full Abstract Currently, an estimated 36 million people worldwide are affected by Alzheimer’s disease or related dementias. In the absence of a cure, non-pharmacological interventions, such as cognitive stimulation, which slow down the rate of deterioration can benefit people with dementia and their caregivers. Such interven- tions have shown to improve well-being and slow down the rate of cognitive decline. It has further been shown that cognitive stimulation in interaction with a computer is as effective as with a human. However, the need to operate a computer often repre- sents a difficulty for the elderly and stands in the way of widespread adoption. A possible solution to this obstacle is to provide a spoken natural language interface that allows people with dementia to interact with the cognitive stimulation software in the same way as they would interact with a human caregiver. This makes the assistive technology accessible to users regardless of their technical skills and provides a fully intuitive user experience. This article describes a pilot study that evaluated the feasibility of computer-based cognitive stimulation through a spoken natural language interface. A prototype software was evaluated with 23 users, including healthy elderly people and people with dementia. Feedback was overwhelmingly positive.
  • 2017. Informatics for Health and Social Care.

Extrinsic vs Intrinsic Evaluation of Natural Language Generation for Spoken Dialogue Systems and Social Robotics.

  • Hastie, H.
  • Cuayáhuitl, H.
  • Dethlefs, N.
  • Keizer, S.
  • Liu, X.
  • Link to book
    Hide/Show Full Abstract [Book abstract] In the past 10 years, very few published studies include some kind of extrinsic evaluation of an NLG component in an end-to-end-system, be it for phone or mobile-based dialogues or social robotic interaction. This may be attributed to the fact that these types of evaluations are very costly to set-up and run for a single component. The question therefore arises whether there is anything to be gained over and above intrinsic quality measures obtained in off-line experiments? In this article, we describe a case study of evaluating two variants of an NLG surface realiser and show that there are significant differences in both extrinsic measures and intrinsic measures. These differences can be used to inform further iterations of component and system development.
  • 2016. In Jokinen, Kristiina and Wilcock, Graham (eds.) Dialogues with Social Robots – Enablements, Analyses, and Evaluation. Berlin: Springer Lecture Notes in Electrical Engineering (LNEE). ISBN 978-981-10-2584-6.

Automatic Identification of Suicide Notes from Linguistic and Sentiment Features.

    Hide/Show Full Abstract Psychological studies have shown that our state of mind can manifest itself in the linguistic features we use to communicate. Recent statistics in suicide prevention show that young people are increasingly posting their last words online. In this paper, we investigate whether it is possible to automatically identify suicide notes and discern them from other types of online discourse based on analysis of sentiments and linguistic features. Using supervised learning, we show that our model achieves an accuracy of 86.6%, outperforming previous work on a similar task by over 4%.
  • 2016. In Proceedings of The 10th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), co-located with ACL-2016. Berlin, Germany.

Information Density and Overlaps in Spoken Dialogue.

  • Dethlefs, N.
  • Hastie, H.
  • Cuayáhuitl, H.
  • Yu, Y.
  • Rieser, V.
  • Lemon, O.
  • PDF
    Hide/Show Full Abstract Incremental dialogue systems are often perceived as more responsive and natural because they are able to address phenomena of turn-taking and overlapping speech, such as backchannels or barge-ins. Previous work in this area has often identified distinctive prosodic features, or features relating to syntactic or semantic completeness, as marking appropriate places of turn-taking. In a separate strand of work, psycholinguistic studies have established a connection between information density and prominence in language—the less expected a linguistic unit is in a particular context, the more likely it is to be linguistically marked. This has been observed across linguistic levels, including the prosodic, which plays an important role in predicting overlapping speech. In this article, we explore the hypothesis that information density (ID) also plays a role in turn-taking. Specifically, we aim to show that humans are sensitive to the peaks and troughs of information density in speech, and that over-lapping speech at ID troughs is perceived as more acceptable than overlaps at ID peaks. To test our hypothesis, we collect human ratings for three models of generating overlapping speech based on features of: (1) prosody and semantic or syntactic completeness, (2) information density, and (3) both types of information. Results show that over 50% of users preferred the version using both types of features, followed by a preference for information density features alone. This indicates a clear human sensitivity to the effects of information density in spoken language and provides a strong motivation to adopt this metric for the design, development and evaluation of turn-taking modules in spoken and incremental dialogue systems.
  • 2016. Computer Speech and Language 37, pp. 82–97.

Why bother? Is evaluation of NLG in an end-to-end Spoken Dialogue System worth it?

  • Hastie, H.
  • Dethlefs, N.
  • Cuayáhuitl, H.
  • Keizer, S.
  • Liu, X.
  • PDF
    Hide/Show Full Abstract In the past 10 years, only around 15% of published conference papers include some kind of extrinsic evaluation of an NLG component in an end-to-end system. These types of evaluations are costly to set-up and run, so is it worth it? Is there anything to be gained over and above intrinsic quality measures obtained in off-line experiments? In this paper, we describe a case study of evaluating two variants of an NLG surface realiser and show that there are significant differences in both extrinsic measures and intrinsic measures. These significant differences would need to be factored into future iterations of the component and therefore, we con- clude that extrinsic evaluations are worthwhile.
  • 2016. In Proceedings of the International Workshop on Spoken Dialogue Systems (IWSDS). Ivalo, Finland.

Hierarchical Reinforcement Learning for Situated Language Generation.

    Hide/Show Full Abstract Natural Language Generation systems in interactive settings often face a multitude of choices, given that the communicative effect of each utterance they generate depends crucially on the interplay between its physical circumstances, addressee and interaction history. This is particularly true in interactive and situated settings. In this paper we present a novel approach for situated Natural Language Generation in dialogue that is based on hierarchical reinforcement learning and learns the best utterance for a context by optimisation through trial and error. The model is trained from human–human corpus data and learns particularly to balance the trade-off between efficiency and detail in giving instructions: the user needs to be given sufficient information to execute their task, but without exceeding their cognitive load. We present results from simulation and a task-based human evaluation study comparing two different versions of hierarchical reinforcement learning: One operates using a hierarchy of policies with a large state space and local knowledge, and the other additionally shares knowledge across generation subtasks to enhance performance. Results show that sharing knowledge across subtasks achieves better performance than learning in isolation, leading to smoother and more successful interactions that are better perceived by human users.
  • 2015. Natural Language Engineering 21, pp 391–435. Cambridge University Press.

Cluster-Based Prediction of User Ratings for Stylistic Surface Realisation.

  • Dethlefs, N.
  • Cuayáhuitl, H.
  • Hastie, H.
  • Rieser, V.
  • Lemon, O.
  • PDF
    Hide/Show Full Abstract Surface realisations typically depend on their target style and audience. A challenge in estimating a stylistic realiser from data is that humans vary significantly in their subjective perceptions of linguistic forms and styles, leading to almost no correlation between ratings of the same utterance. We address this problem in two steps. First, we estimate a mapping function between the linguistic features of a corpus of utterances and their human style ratings. Users are partitioned into clusters based on the similarity of their ratings, so that ratings for new utterances can be estimated, even for new, unknown users. In a second step, the estimated model is used to re-rank the outputs of a number of surface realisers to produce stylistically adaptive output. Results confirm that the generated styles are recognisable to human judges and that predictive models based on clusters of users lead to better rating predictions than models based on an average population of users.
  • 2014. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL). Gothenburg, Sweden.

A Semi-Supervised Clustering Approach for Semantic Slot Labelling.

    Hide/Show Full Abstract Work on training semantic slot labellers for use in Natural Language Processing applications has typically either relied on large amounts of labelled input data, or has assumed entirely unlabelled inputs. The former technique tends to be costly to apply, while the latter is often not as accurate as its supervised counterpart. Here, we present a semi-supervised learning approach that automatically labels the semantic slots in a set of training data and aims to strike a balance between the dependence on labelled data and prediction accuracy. The essence of our algorithm is to cluster clauses based on a similarity function that combines lexical and semantic information. We present experiments that compare different similarity functions for both our semi-supervised setting and a fully unsupervised baseline. While semi-supervised learning expectedly outperforms unsupervised learning, our results show that (1) this effect can be observed based on very few training data instances and that increasing the size of the training data does not lead to better performance, and (2) that lexical and semantic information contribute differently in different domains so that clustering based on both types of information offers the best generalisation.
  • 2014. In Proceedings of the International Conference on Machine Learning and Applications (ICMLA). Detroit, USA.

Training a Statistical Surface Realiser from Automatic Slot Labelling.

    Hide/Show Full Abstract Training a statistical surface realiser typically relies on labelled training data or parallel data sets, such as corpora of paraphrases. The procedure for obtaining such data for new domains is not only time-consuming, but it also restricts the incorporation of new semantic slots during an interaction, i.e. using an online learning scenario for automatically extended domains. Here, we present an alternative approach to statistical surface realisation from unlabelled data through automatic semantic slot labelling. The essence of our algorithm is to cluster clauses based on a similarity function that combines lexical and semantic information. Annotations need to be reliable enough to be utilised within a spoken dialogue system. We compare different similarity functions and evaluate our surface realiser—trained from unlabelled data—in a human rating study. Results confirm that a surface realiser trained from automatic slot labels can lead to outputs of comparable quality to outputs trained from human-labelled inputs.
  • 2014. In Proceedings of the IEEE Workshop on Spoken Language Technology (SLT). South Lake Tahoe, USA.

The PARLANCE Mobile App for Interactive Search in English and Mandarin.

  • Hastie, H.
  • Aufaure, M.
  • Alexopoulos, P.
  • Bouchard, H.
  • Cuayáhuitl, H.
  • Dethlefs, N.
  • Gasic, M.
  • Henderson, J.
  • Lemon, O.
  • Liu, X.
  • Mika, P.
  • Ben Mustapha, N.
  • Potter, T.
  • Rieser, V.
  • Thomson, B.
  • Tsiakoulis, P.
  • Vanrompay, Y.
  • Villa-Terrazas, B.
  • Yazdani, M.
  • Young, S.
  • Yu, Y.
  • PDF
    Hide/Show Full Abstract We demonstrate a mobile application in English and Mandarin to test and evaluate components of the Parlance dialogue system for interactive search under real-world conditions.
  • 2014. In Proceedings of the Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGdial).

Non-Strict Hierarchical Reinforcement Learning for Interactive Systems and Robots.

    Hide/Show Full Abstract Conversational systems and robots that use reinforcement learning for policy optimization in large domains often face the problem of limited scalability. This problem has been addressed either by using function approximation techniques that estimate the approximate true value function of a policy or by using a hierarchical decomposition of a learning task into subtasks. We present a novel approach for dialogue policy optimization that combines the benefits of both hierarchical control and function approximation and that allows flexible transitions between dialogue subtasks to give human users more control over the dialogue. To this end, each reinforcement learning agent in the hierarchy is extended with a subtask transition function and a dynamic state space to allow flexible switching between subdialogues. In addition, the subtask policies are represented with linear function approximation in order to generalize the decision making to situations unseen in training. Our proposed approach is evaluated in an interactive conversational robot that learns to play quiz games. Experimental results, using simulation and real users, provide evidence that our proposed approach can lead to more flexible (natural) interactions than strict hierarchical control and that it is preferred by human users.
  • 2014. ACM Transactions on Interactive Intelligent Systems. Vol. 4, No. 4.

Context-Sensitive Natural Language Generation: From Knowledge-Driven to Data-Driven Techniques.

    Hide/Show Full Abstract Context-sensitive Natural Language Generation is concerned with the automatic generation of system output that is in several ways adaptive to its target audience or the situational circumstances of its production. In this article, I will provide an overview of the most popular methods that have been applied to context-sensitive generation. A particular focus will be on the shift from knowledge-driven to data- driven approaches that has been witnessed in the last decade. While this shift has offered powerful new methods for large-scale adaptivity and flexible output generation, purely data-driven approaches still struggle to reach the linguistic depth of their knowledge-driven predecessors. Bridging the gap between both types of approaches is therefore an important future research direction.
  • 2014. Language and Linguistics Compass, Vol. 8(3), pp. 99–115.

A Joint Learning Approach for Situated Language Generation.

    Hide/Show Full Abstract [Book abstract] An informative and comprehensive overview of the state-of-the-art in natural language generation (NLG) for interactive systems, this guide serves to introduce graduate students and new researchers to the field of natural language processing and artificial intelligence, while inspiring them with ideas for future research. Detailing the techniques and challenges of NLG for interactive applications, it focuses on the research into systems that model collaborativity and uncertainty, are capable of being scaled incrementally, and can engage with the user effectively. A range of real-world case studies is also included. The book and the accompanying website feature a comprehensive bibliography, and refer the reader to corpora, data, software and other resources for pursuing research on natural language generation and interactive systems, including dialog systems, multimodal interfaces and assistive technologies. It is an ideal resource for students and researchers in computational linguistics, natural language processing and related fields.
  • 2014. In Amanda Stent and Srinivas Bangalore (eds.) Natural Language Generation in Interactive Systems. Cambridge University Press.

Getting to Know Users: Accounting for the Variability in User Ratings.

  • Dethlefs, N.
  • Cuayáhuitl, H.
  • Hastie, H.
  • Rieser, V.
  • Lemon, O.
  • PDF
    Hide/Show Full Abstract Evaluations of dialogue systems and language generators often rely on subjective user ratings to assess output quality and performance. Humans however vary in their preferences so that estimating an accurate prediction model is difficult. Using a method that clusters utterances based on their linguistic features and ratings (Dethlefs et al., 2014), we discuss the possibility of obtaining user feedback implicitly during an interaction. This approach promises better predictions of user preferences through continuous re-estimation.
  • 2014. Poster paper in the Workshop on the Semantics and Pragmatics of Dialogue (SemDial). Edinburgh, Scotland.

Two Alternative Frameworks for Deploying Spoken Dialogue Systems to Mobile Platforms for Evaluation “in the Wild”.

  • Hastie, H.
  • Aufaure, M.
  • Alexopoulos, P.
  • Bouchard, H.
  • Cuayáhuitl, H.
  • Dethlefs, N.
  • Gasic, M.
  • Henderson, J.
  • Lemon, O.
  • Liu, X.
  • Mika, P.
  • Potter, T.
  • Rieser, V.
  • Tsiakoulis, P.
  • Vanrompay, Y.
  • Villa-Terrazas, B.
  • Yazdani, M.
  • Young, S.
  • Yu, Y.
  • PDF
    Hide/Show Full Abstract We demonstrate two alternative frameworks for testing and evaluating spoken dialogue systems on mobile devices for use “in the wild”. We firstly present a spoken dialogue system that uses third party ASR (Automatic Speech Recognition) and TTS (Text-To-Speech) components and then present an alternative using audio compression to allow for entire systems with home-grown ASR/TTS to be plugged in directly. Some advantages and drawbacks of both are discussed.
  • 2014. Poster paper in the Workshop on the Semantics and Pragmatics of Dialogue (SemDial). Edinburgh, Scotland.

Conditional Random Fields for Responsive Surface Realisation Using Global Features.

    Hide/Show Full Abstract Surface realisers in spoken dialogue systems need to be more responsive than conventional surface realisers. They need to be sensitive to the utterance context as well as robust to partial or changing generator inputs. We formulate surface realisation as a sequence labelling task and combine the use of conditional random fields (CRFs) with semantic trees. Due to their extended notion of context, CRFs are able to take the global utterance context into account and are less constrained by local features than other realisers. This leads to more natural and less repetitive surface realisation. It also allows generation from partial and modified inputs and is therefore applicable to incremental surface realisation. Results from a human rating study confirm that users are sensitive to this extended notion of context and assign ratings that are significantly higher (up to 14%) than those for taking only local context into account.
  • 2013. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL). Sofia, Bulgaria.

Hierarchical Joint Learning for Natural Language Generation.

    Hide/Show Full Abstract Natural Language Generation (NLG) systems typically face an uncertainty regarding the best utterance to communicate to a user in a given context given that the effect of a single utterance depends crucially on the interplay between its physical environment, pragmatic circumstances, addressee and interaction history. NLG system designers have traditionally used a pipeline architecture to divide the generation process into the distinct stages of content selection, utterance planning and surface realisation to choose the semantics, organisation and realisation of an utterance. Unfortunately, this sequential model does not account for the interdependencies that exist among these stages, which in practice has been manifest in inefficient instruction giving and an increased cognitive load for the user. This thesis will advocate a joint optimisation framework for situated NLG that is based on Hierarchical Reinforcement Learning combined with graphical models and will learn the best utterance for a given context by optimising its behaviour through a trial and error search. The joint model considers decisions at different NLG stages in interdependence with each other and thereby produces more context-sensitive utterances than is possible when considering decisions in isolation. To enhance the human-likeness of the model, two augmentations will be made. We will introduce the notion of a Hierarchical Information State to support the systematic pre-specification of prior knowledge and human preferences for content selection. Graphical models—Hidden Markov Models and Bayesian Networks—will then be integrated as generation spaces to encourage natural surface realisation by balancing the proportion of alignment and variation. Results from a human evaluation study show that the hierarchical learning agent learns a robust generation policy that adapts to new circumstances and users flexibly leading to smooth and successful interactions. In terms of the comparison between a joint and an isolated optimisation, results indicate that a jointly optimised system achieves higher user satisfaction and task success and is better perceived by human users than its isolated counterpart. To demonstrate the domain-independence and generalisabilty of the hierarchical joint optimisation framework, an additional study will be presented that transfers the model to a new, but related, domain: the generation of route instructions in a real navigation scenario using a situated dialogue system for indoor navigation. Results confirm that the NLG policy can be applied to new domains with limited effort and contribute to high task success and user satisfaction.
  • 2013. IOS Press / AKA Publishing. In Series Dissertations on Artificial Intelligence, Volume 340. ISBN 978-1-61499-115-1. Amsterdam / Berlin.

Hierarchical Joint Learning for Natural Language Generation.

    Hide/Show Full Abstract Natural Language Generation (NLG) systems typically face an uncertainty regarding the best utterance to communicate to a user in a given context given that the effect of a single utterance depends crucially on the interplay between its physical environment, pragmatic circumstances, addressee and interaction history. NLG system designers have traditionally used a pipeline architecture to divide the generation process into the distinct stages of content selection, utterance planning and surface realisation to choose the semantics, organisation and realisation of an utterance. Unfortunately, this sequential model does not account for the interdependencies that exist among these stages, which in practice has been manifest in inefficient instruction giving and an increased cognitive load for the user. This thesis will advocate a joint optimisation framework for situated NLG that is based on Hierarchical Reinforcement Learning combined with graphical models and will learn the best utterance for a given context by optimising its behaviour through a trial and error search. The joint model considers decisions at different NLG stages in interdependence with each other and thereby produces more context-sensitive utterances than is possible when considering decisions in isolation. To enhance the human-likeness of the model, two augmentations will be made. We will introduce the notion of a Hierarchical Information State to support the systematic pre-specification of prior knowledge and human preferences for content selection. Graphical models—Hidden Markov Models and Bayesian Networks—will then be integrated as generation spaces to encourage natural surface realisation by balancing the proportion of alignment and variation. Results from a human evaluation study show that the hierarchical learning agent learns a robust generation policy that adapts to new circumstances and users flexibly leading to smooth and successful interactions. In terms of the comparison between a joint and an isolated optimisation, results indicate that a jointly optimised system achieves higher user satisfaction and task success and is better perceived by human users than its isolated counterpart. To demonstrate the domain-independence and generalisabilty of the hierarchical joint optimisation framework, an additional study will be presented that transfers the model to a new, but related, domain: the generation of route instructions in a real navigation scenario using a situated dialogue system for indoor navigation. Results confirm that the NLG policy can be applied to new domains with limited effort and contribute to high task success and user satisfaction.
  • 2013. PhD Thesis. University of Bremen, Faculty of Linguistics, Germany.

Impact of ASR N-Best Information on Bayesian Dialogue Act Recognition.

    Hide/Show Full Abstract A challenge in dialogue act recognition is the mapping from noisy user inputs to dialogue acts. In this paper we describe an approach for re-ranking dialogue act hypotheses based on Bayesian classifiers that incorporate dialogue history and Automatic Speech Recognition (ASR) N-best information. We report results based on the Let’s Go dialogue corpora that show (1) that including ASR N-best information results in improved dialogue act recognition performance (+7% accuracy), and (2) that competitive results can be obtained from as early as the first system dialogue act, reducing the need to wait for subsequent system dialogue acts.
  • 2013. In Proceedings of the 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGdial). Metz, France.

Proceedings of the Young Researcher’s Roundtable on Spoken Dialogue Systems.

  • El Asri, L.
  • Dethlefs, N.
  • Henderson, M.
  • Kennington, C.
  • Mitchell, C.
  • Schütte, N.
  • Villalba, M.
  • Baheux, D.
  • PDF
    Hide/Show Full Abstract We are delighted to welcome you to the Ninth Young Researchers’ Roundtable on Spoken Dialogue Systems in Metz, France. YRRSDS is a yearly event that began in 2005 in Lisbon, followed by Pittsburgh, Antwerp, Columbus, London, Tokyo, Portland, and Seoul. The aim of the workshop is to promote the networking of students, post docs, and junior researchers working in research related to spoken dialogue systems in academia and industry. The workshop provides an open forum where participants can discuss their research interests, current work, and future plans.
  • 2013. Co-located with the 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGdial). Metz, France.

Barge-in Effects in Bayesian Dialogue Act Recognition and Simulation.

    Hide/Show Full Abstract Dialogue act recognition and simulation are traditionally considered separate processes. Here, we argue that both can be fruitfully treated as interleaved processes within the same probabilistic model, leading to a synchronous improvement of performance in both. To demonstrate this, we train multiple Bayes Nets that predict the timing and content of the next user utterance. A specific focus is on providing support for barge-ins. We describe experiments using the Let’s Go data that show an improvement in classification accuracy (+5%) in Bayesian dialogue act recognition involving barge-ins using partial context compared to using full context. Our results also indicate that simulated dialogues with user barge-in are more realistic than simulations without barge-in events.
  • 2013. In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Olomouc, Czech Republic.

Demonstration of the PARLANCE System: A Data-Driven, Incremental, Spoken Dialogue System for Interactive Search.

  • Hastie, H.
  • Aufaure, M.
  • Alexopoulos, P.
  • Cuayáhuitl, H.
  • Dethlefs, N.
  • Gasic, M.
  • Henderson, J.
  • Lemon, O.
  • Liu, X.
  • Mika, P.
  • Mustapha, N.
  • Rieser, V.
  • Thomson, B.
  • Tsiakoulis, P.
  • Vanrompay, Y.
  • Villazon-Terrazas, B.
  • Young, S.
  • PDF
    Hide/Show Full Abstract The Parlance system for interactive search processes dialogue at a micro-turn level, displaying dialogue phenomena that play a vital role in human spoken conversation. These dialogue phenomena include more natural turn-taking through rapid system responses, generation of backchannels, and user barge-ins. The Parlance demonstration system differentiates from other incremental systems in that it is data-driven with an infrastructure that scales well.
  • 2013. In Proceedings of the 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGdial). Metz, France.

Optimising Incremental Dialogue Decisions Using Information Density for Interactive Systems.

    Hide/Show Full Abstract Abstract Incremental processing allows system designers to address several discourse phenomena that have previously been somewhat neglected in interactive systems, such as backchannels or barge-ins, but that can enhance the responsiveness and naturalness of systems. Unfortunately, prior work has focused largely on deterministic incremental decision making, rendering system behaviour less flexible and adaptive than is desirable. We present a novel approach to incremental decision making that is based on Hierarchical Reinforcement Learning to achieve an interactive optimisation of Information Presentation (IP) strategies, allowing the system to generate and comprehend backchannels and barge-ins, by employing the recent psycholinguistic hypothesis of information density (ID) (Jaeger, 2010). Results in terms of average rewards and a human rating study show that our learnt strategy outperforms several baselines that are not sensitive to ID by more than 23%.
  • 2012. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-CoNLL). Jeju, South Korea.

Optimising Incremental Generation for Spoken Dialogue Systems: Reducing the Need for Fillers.

    Hide/Show Full Abstract Recent studies have shown that incremental systems are perceived as more reactive, natural, and easier to use than non-incremental systems. However, previous work on incremental NLG has not employed recent advances in statistical optimisation using machine learning. This paper combines the two approaches, showing how the update, revoke and purge operations typically used in incremental approaches can be implemented as state transitions in a Markov Decision Process. We design a model of incremental NLG that generates output based on micro-turn interpretations of the user’s utterances and is able to optimise its decisions using statistical machine learning. We present a proof-of-concept study in the domain of Information Presentation (IP), where a learning agent faces the trade-off of whether to present information as soon as it is available (for high reactiveness) or else to wait until input ASR hypotheses are more reliable. Results show that the agent learns to avoid long waiting times, fillers and self-corrections, by re-ordering content based on its confidence.
  • 2012. In Proceedings of the 7th International Conference on Natural Language Generation (INLG). Chicago, IL, USA.

Comparing HMMs and Bayesian Networks for Surface Realisation.

    Hide/Show Full Abstract Natural Language Generation (NLG) systems often use a pipeline architecture for sequential decision making. Recent studies however have shown that treating NLG decisions jointly rather than in isolation can improve the overall performance of systems. We present a joint learning framework based on Hierarchical Reinforcement Learning (HRL) which uses graphical models for surface realisation. Our focus will be on a comparison of Bayesian Networks and HMMs in terms of user satisfaction and naturalness. While the former perform best in isolation, the latter present a scalable alternative within joint systems.
  • 2012. In Proceedings of the 12th Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT). Montréal, Canada.

Dialogue Systems Using Online Learning: Beyond Empirical Methods.

    Hide/Show Full Abstract We discuss a change of perspective for training dialogue systems, which requires a shift from traditional empirical methods to online learning methods. We motivate the application of online learning, which provides the benefit of improving the system’s behaviour continuously often after each turn or dialogue rather than after hundreds of dialogues. We describe the requirements and advances for dialogue systems with online learning, and speculate on the future of these kinds of systems.
  • 2012. In Proceedings of the Workshop on Future Directions and Needs in the Spoken Dialogue Community: Tools and Data (SDCTD). Co-located with NAACL-HLT. Montréal, Canada.

Incremental Spoken Dialogue Systems: Tools and Data.

    Hide/Show Full Abstract Strict-turn taking models of dialogue do not accurately model human incremental processing, where users can process partial input and plan partial utterances in parallel. We discuss the current state of the art in incremental systems and propose tools and data required for further advances in the field of Incremental Spoken Dialogue Systems.
  • 2012. In Proceedings of the Workshop on Future Directions and Needs in the Spoken Dialogue Community: Tools and Data (SDCTD). Co-located with NAACL-HLT. Montréal, Canada.

Optimising Incremental Generation for Information Presentation of Mobile Search Results.

    Hide/Show Full Abstract This abstract discusses a proof-of-concept study in incremental Natural Language Generation (NLG) in the domain of Information Presentation for Spoken Dialogue Systems. The work presented is part of the FP7 EC Parlance project (http://www.parlance- project.eu). The goal of Parlance is to develop personalised, mobile, interactive, hyper-local search through speech. Recent trends in Information Retrieval are towards incremental, interactive search and we argue that spoken dialogue systems can provide a truly natural medium for this type of interactive search. This is particularly attractive for people on the move, who have their hands and eyes busy.
  • 2012. Presentation at Symposium: Influencing People with Information (SIPI). Aberdeen, Scotland.

Spatially-Aware Dialogue Control Using Hierarchical Reinforcement Learning.

    Hide/Show Full Abstract This article addresses the problem of scalable optimization for spatially-aware dialogue systems. These kinds of systems must perceive, reason, and act about the spatial environment where they are embedded. We formulate the problem in terms of Semi-Markov Decision Processes and propose a hierarchical reinforcement learning approach to optimize subbehaviors rather than full behaviors. Because of the vast number of policies that are required to control the interaction in a dynamic environment (e.g., a dialogue system assisting a user to navigate in a building from one location to another), our learning approach is based on two stages: (a) the first stage learns low-level behavior, in advance; and (b) the second stage learns high-level behavior, in real time. For such a purpose we extend an existing algorithm in the literature of reinforcement learning in order to support reusable policies and therefore to perform fast learning. We argue that our learning approach makes the problem feasible, and we report on a novel reinforcement learning dialogue system that performs a joint optimization between dialogue and spatial behaviors. Our experiments, using simulated and real environments, are based on a text-based dialogue system for indoor navigation. Experimental results in a realistic environment reported an overall user satisfaction result of 89%, which suggests that our proposed approach is attractive for its application in real interactions as it combines fast learning with adaptive and reasonable behavior.
  • 2011. ACM Transactions on Speech and Language Processing (Special Issue on Machine Learning for Robust and Adaptive Spoken Dialogue Systems). Vol. 7, No. 3, pp. 1-26.

Hierarchical Reinforcement Learning and Hidden Markov Models for Task-Oriented Natural Language Generation.

    Hide/Show Full Abstract Surface realisation decisions in language generation can be sensitive to a language model, but also to decisions of content selection. We therefore propose the joint optimisation of content selection and surface realisation using Hierarchical Reinforcement Learning (HRL). To this end, we suggest a novel reward function that is induced from human data and is especially suited for surface realisation. It is based on a generation space in the form of a Hidden Markov Model (HMM). Results in terms of task success and human-likeness suggest that our unified approach performs better than greedy or random baselines.
  • 2011. In Proceedings of the 49th Annual Conference of the Association for Computational Linguistics (ACL-HLT). Short Papers. Portland, OR, USA.

Optimizing Situated Dialogue Management in Unknown Environments.

    Hide/Show Full Abstract We present a conversational learning agent that helps users navigate through complex and challenging spatial environments. The agent exhibits adaptive behaviour by learning spatially-aware dialogue actions while the user carries out the navigation task. To this end, we use Hierarchical Reinforcement Learning with relational representations to efficiently optimize dialogue actions tightly-coupled with spatial ones, and Bayesian networks to model the user’s beliefs of the navigation environment. Since these beliefs are continuously changing, we induce the agent’s behaviour in real time. Experimental results, using simulation, are encouraging by showing efficient adaptation to the user’s navigation knowledge, specifically to the generated route and the intermediate locations to negotiate with the user.
  • 2011. In Proceedings of INTERSPEECH. Florence, Italy.

Optimising Natural Language Generation Decision Making for Situated Dialogue.

    Hide/Show Full Abstract Natural language generators are faced with a multitude of different decisions during their generation process. We address the joint optimisation of navigation strategies and referring expressions in a situated setting with respect to task success and human-likeness. To this end, we present a novel, comprehensive framework that combines supervised learning, Hierarchical Reinforcement Learning and a hierarchical Information State. A human evaluation shows that our learnt instructions are rated similar to human instructions, and significantly better than the supervised learning baseline.
  • 2011. In Proceedings of the 12th Annual Meeting on Discourse and Dialogue (SIGdial). Portland, OR, USA.

Combining Hierarchical Reinforcement Learning and Bayesian Networks for Natural Language Generation.

    Hide/Show Full Abstract Language generators in situated domains face a number of content selection, utterance planning and surface realisation decisions, which can be strictly interdependent. We therefore propose to optimise these processes in a joint fashion using Hierarchical Reinforcement Learning. To this end, we induce a reward function for content selection and utterance planning from data using the PARADISE framework, and suggest a novel method for inducing a reward function for surface realisation from corpora. It is based on generation spaces represented as Bayesian Networks. Results in terms of task success and human-likeness suggest that our unified approach performs better than a baseline optimised in isolation or a greedy or random baseline. It receives human ratings close to human authors.
  • 2011. In Proceedings of the 13th European Workshop on Natural Language Generation (ENLG). Nancy, France.

The Bremen System for the GIVE-2.5 Challenge.

    Hide/Show Full Abstract This paper presents the Bremen system for the GIVE-2.5 challenge. It is based on decision trees learnt from new annotations of the GIVE corpus augmented with manually specified rules. Surface realisation is based on context-free grammars. The paper will address advantages and shortcomings of the approach and discuss how the present system can serve as a baseline for a future evaluation with an improved version using hierarchical reinforcement learning with graphical models.
  • 2011. In Proceedings of the 13th European Workshop on Natural Language Generation (ENLG). Generation Challenges Session. Nancy, France.

Position Paper in the Young Researchers’ Roundtable on Spoken Dialogue Systems (YRRSDS).

    Hide/Show Full Abstract My research interests involve context-sensitive, or adaptive, Natural Language Generation (NLG) for situated dialogue systems, especially for spoken interaction. Context-sensitive situated dialogue systems are typically required to adapt flexibly to dynamic changes of (a) properties of the situation or the spatial setting, such as visible objects, or the complexity of the environment, (b) properties of the user, such as their prior knowledge, goals, beliefs, and general information need, and (c) the dialogue history. In this context, I am mainly interested in applying Reinforcement Learning (RL) with hierarchical control and prior knowledge in several contexts of rather large-scale systems for complex domains. I have also recently looked into the joint optimisation of different system behaviours for interdependent decision making between them.
  • 2011. Portland, OR, USA.

Hierarchical Reinforcement Learning for Adaptive Text Generation.

    Hide/Show Full Abstract We present a novel approach to natural language generation (NLG) that applies hierarchical reinforcement learning to text generation in the wayfinding domain. Our approach aims to optimise the integration of NLG tasks that are inherently different in nature, such as decisions of content selection, text structure, user modelling, referring expression generation (REG), and surface realisation. It also aims to capture existing interdependencies between these areas. We apply hierarchical reinforcement learning to learn a generation policy that captures these interdependencies, and that can be transferred to other NLG tasks. Our experimental results—in a simulated environment—show that the learnt wayfinding policy outperforms a baseline policy that takes reasonable actions but without optimization.
  • 2010. In Proceedings of the 6th International Conference on Natural Language Generation (INLG). Dublin, Ireland.

Route Instructions in Map-Based and Human-Based Dialogue: A Comparative Analysis.

  • Tenbrink, T.
  • Ross, R.
  • Thomas, K.
  • Dethlefs, N.
  • Andonova, E.
  • Link to article
    Hide/Show Full Abstract When conveying information about spatial situations and goals, speakers adapt flexibly to their addressee in order to reach the communicative goal efficiently and effortlessly. Our aim is to equip a dialogue system with the abilities required for such a natural, adaptive dialogue. In this paper we investigate the strategies people use to convey route information in relation to a map by presenting two parallel studies involving human–human and human–computer interaction. We compare the instructions given to a human interaction partner with those given to a dialogue system which reacts by basic verbal responses and dynamic visualization of the route in the map. The language produced by human route givers is analyzed with respect to a range of communicative as well as cognitively crucial features, particularly perspective choice and references to locations across levels of granularity. Results reveal that speakers produce systematically different instructions with respect to these features, depending on the nature of the interaction partner, human or dialogue system. Our further analysis of clarification and reference resolution strategies produced by human route followers provides insights into dialogue strategies that future systems should be equipped with.
  • 2010. Journal of Visual Languages and Computing. Vol. 21, No. 5, pp. 292-309.

Evaluating Task Success in a Dialogue System for Indoor Navigation

  • Dethlefs, N.
  • Cuayáhuitl, H.
  • Richter, K.-F.
  • Andonova, E.
  • Bateman, J.
  • PDF
    Hide/Show Full Abstract In this paper we address the assessment of dialogue systems for indoor wayfinding. Based on the PARADISE evaluation framework we propose and evaluate several task success metrics for such a purpose. According to correlation and multiple linear regression analyses, we found that task success metrics that penalise difficulty in wayfinding are more informative of system performance than a success/failure binary task success metric.
  • 2010. In Proceedings of the 14th Workshop on the Semantics and Pragmatics of Dialogue (SemDial-PozDial). Poznan, Poland.

Generating Adaptive Route Instructions Using Hierarchical Reinforcement Learning

  • Cuayáhuitl, H.
  • Dethlefs, N.
  • Frommberger, L.
  • Richter, K.-F.
  • Bateman, J.
  • PDF
    Hide/Show Full Abstract We present a learning approach for efficiently inducing adaptive behaviour of route instructions. For such a purpose we propose a two-stage approach to learn a hierarchy of wayfinding strategies using hierarchical reinforcement learning. Whilst the first stage learns low-level behaviour, the second stage focuses on learning high-level behaviour. In our proposed approach, only the latter is to be applied at runtime in user-machine interactions. Our experiments are based on an indoor navigation scenario for a building that is complex to navigate. We compared our approach with flat reinforcement learning and a fully-learnt hierarchical approach. Our experimental results show that our proposed approach learns significantly faster than the baseline approaches. In addition, the learnt behaviour shows to adapt to the type of user and structure of the spatial environment. This approach is attractive to automatic route giving since it combines fast learning with adaptive behaviour.
  • 2010. In Proceedings of the 7th International Conference on Spatial Cognition **(Spatial Cognition VII)**. Portland, OR, USA.

The Dublin-Bremen System for the GIVE2-Challenge

    Hide/Show Full Abstract This paper describes the Dublin-Bremen GIVE-2 generation system. Our main approach focused on abstracting over the low-level behaviour of the baseline agent and guide the user by more high-level navigation information. For this purpose, we provided the user with (a) high-level action commands, (b) lookahead information, and (c) a “patience” period after they left the intended path to allow exploration. We describe a number of problems that our system encountered during the evaluation due to some of our initial assumptions not holding, and address several means by which we could achieve better performance in the future.
  • 2010. Poster presentation at the 6th International Conference on Natural Language Generation (INLG). Dublin, Ireland.

A Dialogue System for Indoor Wayfinding Using Text-Based Natural Language

  • Cuayáhuitl, H.
  • Dethlefs, N.
  • Richter, K.-F.
  • Tenbrink, T.
  • Bateman, J.
  • PDF
    Hide/Show Full Abstract We present a dialogue system that automatically generates indoor route instructions in German when asked about locations, using text-based natural language input and output. The challenging task in this system is to provide the user with a compact set of accurate and comprehensible instructions. We describe our approach based on high-level instructions. The system is described with four main modules: natural language understanding, dialogue management, route instruction generation and natural language generation. We report an evaluation with users unfamiliar with the system — using the PARADISE evaluation framework — in a real environment and naturalistic setting. We present results with high user satisfaction, and discuss future directions for enhancing this kind of system with more sophisticated and intuitive interaction.
  • 2010. International Journal of Computational Linguistics and Applications. Vol. 1, No. 2, pp. 285-304. Posted presented at the 11th Conference on Intelligent Text Processing and Computational Linguistics (CICLing).