Retrieval and Semantic Parsing

In this WP, we use the anticipatory models and representations for natural language understanding tasks, like recognition of events and actions in text as well as their participating entities and coreferents, and their spatial and temporal relations.

A first challenge was the design and evaluation of efficient storage and retrieval of the enriched representations. E.g., we propose an incremental retrieval scheme for continuous representations of textual facts triggered by words and phrases in both input questions (in a QA context) and previously gathered facts [5]. The retrieval is refined by jointly contextualizing and processing the initially retrieved continuous representations, where semantic roles and coreferences can play a role. Coreferences play a role in another work, where we define mechanisms inspired by human memory – rehearsal and anticipation – to improve selection and storage of information in a memory network, so it can later be more efficiently and more effectively retrieved and used for answering questions [8]. Cosine similarity or other distance metrics are often used to retrieve semantically similar items (text or images) based on their continuous representations. In [6] the sensitivity of these metrics to common transformations like centering is tested, both for image and text embeddings provided by popular off-the-shelf feature extractors.

In [3] we learn matrix representations of objects in a novel way, exploiting distances to different contextual reference frames, taking inspiration from human memory. These representations can be used for highly efficient and configurable semantic retrieval, among other applications.

The next line of work is concerned with using the enriched models to parse both spatial and temporal information. To start, we evaluate to what extent multimodal pretrained transformers, which can be seen as visually enriched language models, parse and retain semantic roles and relations present in input text and/or images [7]. The roles and relations might be explicit or implicit in the input text. We find that current visually enriched language models do not yet parse and retain in their continuous representations the underlying semantic structures to a satisfying extent.

We give an overview of current methods that parse temporal information from text, and methods that reason with temporal relations [1]. We propose and evaluate a method to parse probabilistic absolute event timelines for both events of which the temporal information is both explicit and implicit in the text [2]. Finally, we also focus on parsing spatial relations instead of temporal ones [4]. We propose a method to decode 2D spatial arrangements from input sentences, by finetuning pretrained text transformers. This works for both implicitly and explicitly mentioned arrangements.

We will integrate our developed methods in this WP with the methods from the next WP (WP 5, on inference and learning of new knowledge), into a story demonstrator that translates stories to events in a visual, virtual world.

#	Year	Title	Authors	Venue	Description
1	2019	A survey on temporal reasoning for temporal information extraction from text	Leeuwenberg, Artuur and Moens, Marie-Francine	JAIR 2019	This article presents a comprehensive survey of the research from the past decades on temporal reasoning for automatic temporal information extraction from text, providing a case study on the integration of symbolic reasoning with machine learning-based information extraction systems.
2	2020	Towards Extracting Absolute Event Timelines from English Clinical Reports	Leeuwenberg, Tuur and Moens, Marie-Francine	IEEE	An approach towards extraction of more complete temporal information for all events, and obtain probabilistic absolute event timelines by modeling temporal uncertainty with information bounds.
3	2020	Structured (De)composable Representations Trained with Neural Networks	Spinks, Graham and Moens, Marie-Francine		End-to-end deep learning technique to learn structured and composable representations.
4	2020	Decoding Language Spatial Relations to 2D Spatial Arrangements	Radevski, Gorjan and Collell, Guillem and Moens, Marie-Francine and Tuytelaars, Tinne	EMNLP 2020	We propose Spatial-Reasoning Bert (SR-Bert) for the problem of multimodal spatial understanding by decoding a set of language-expressed spatial relations to a set of 2D spatial arrangements in a multi-object and multi-relationship setting.
5	2020	Autoregressive Reasoning over Chains of Facts with Transformers	Ruben Cartuyvels, Graham Spinks, Marie-Francine Moens	COLING 2020	An iterative inference algorithm for multi-hop explanation regeneration, that retrieves relevant factual evidence in the form of text snippets, given a natural language question and its answer.
6	2021	How Do Simple Transformations of Text and Image Features Impact Cosine-based Semantic Match	Collell, Guillem and Moens, Marie-Francine	ECIR 2021	We investigate the impact of transformations on semantic distances between embeddings produced by common language models and image CNNs.
7	2022	Finding Structural Knowledge in Multimodal-BERT	Milewski, Victor and de Lhoneux, Miryam and Moens, Marie-Francine	ACL 2022	we introduce scene trees, by mapping the linguistic dependency tree ontop of regions, to investigate if BERT learns structures over the image regions.
8	2023	A Memory Model for Question Answering from Streaming Data Supported by Rehearsal and Anticipation of Coreference Information	Vladimir Araujo, Alvaro Soto, Marie-Francine Moens	ACL 2023	Drawing inspiration from human mechanisms, we propose a memory model that performs rehearsal and anticipation while processing inputs to memorize important information for solving question answering tasks from streaming data.