WP3 focuses on the development of innovative algorithms designed to capture the intricate relational structure of language and the geometric and appearance facets of visual data. The emphasis is on anticipatory representations inspired by how the brain works. The objective is to create more robust and informative representations for both visual and textual data.
In the past, there has been a lot of research on text representation learning methods, and with CALCULUS we add to the discussions in this line of research [6,10], contribute evaluation benchmarks [13], and make algorithmic contributions, especially with regards to anticipatory models inspired by the brain. Specifically, several of our publications are concerned with integrating predictive coding theory into the pretraining of neural networks [1, 4, 9, 14], which aim to make representations predictive of future stimuli. While the former works are concerned with textual stimuli only, we argue that a powerful language representation needs to be competent at predicting visual stimuli, too, and learn to navigate the environment of a confined world [3]. For example, our autonomous agent LAD (Layout-aware Dreamer) [15], imagines the destination of its goal for deciding the next action.
Beyond this work, visual-linguistic representation learning is central to the goals of CALCULUS: A burgeoning area of research focuses on the intersection of text and image data, aiming to ground text representations within the context of visual stimuli. In this paradigm, textual semantics are not just inferred from text but also anchored in associated images, creating a more holistic understanding. The research here aligns precisely with the overarching aim of WP3, which is to establish robust and contextually aware representations from both language and visual data, ultimately resulting in more intelligent and predictive systems. Indeed, we show that learning visually grounded representations improves performance even on tasks that only involve text [7]. Furthermore, we investigate whether multimodal representations have learned visual structures analogous to linguisic structures [12], discover the underlying causal structure of the data [11], and learn to map verbal descriptions of objects’ spatial relationships to the image [5], with practical applications such as self-driving [2,8].
A third research path in the context of anticipatory, brain-inspired context is closely aligned with neuroscientific methods and insights to explore the connection between human brain activity and machine-learned language representations. These works involve neural encoding techniques to measure how well the activations in artificial neural networks correspond with human neural processes during language processing. Various types of sentence embedding models and fine-tuning approaches are being investigated for their effectiveness in replicating patterns found in human brain activity [16, 17, 18].
# | Year | Title | Authors | Venue | Description |
---|---|---|---|---|---|
1 | 2019 | Improving Natural Language Understanding through Anticipation-Enriched Representations. | Cornille, Nathan and Moens, Marie-Francine | HBP 2019 | Poster with first idea for internal-self-prediction objective for BERT, presented at Human Brain Project workshop in Glasgow. |
2 | 2020 | Giving Commands to a Self-driving Car: A Multimodal Reasoner for Visual Grounding | Deruyttere, Thierry and Collell, Guillem and Moens, Marie-Francine | A new spatial memory module and a spatial reasoner for the Visual Grounding task. We focus on integrating the regions of a Region Proposal Network into a new multi-step reasoning model. | |
3 | 2020 | Learning Grammar in Confined Worlds | Spinks, Graham and Cartuyvels, Ruben and Moens, Marie-Francine | LNEE | In this position paper we argue that modern machine learning approaches fail to adequately address how grammar and common sense should be learned.We advocate for experiments with the use of abstract, confined world environments where agents interact with the emphasis on learning world models. |
4 | 2020 | Improving Language Understanding in Machines through Anticipation. | Cornille, Nathan and Collel, Guillem and Moens, Marie-Francine | NAISys 2020 | Poster that reflects on some of the issues with an internal contrastive objective that aims to improve representation learning. |
5 | 2020 | Decoding Language Spatial Relations to 2D Spatial Arrangements | Radevski, Gorjan and Collell, Guillem and Moens, Marie-Francine and Tuytelaars, Tinne | EMNLP 2020 | We propose Spatial-Reasoning Bert (SR-Bert) for the problem of multimodal spatial understanding by decoding a set of language-expressed spatial relations to a set of 2D spatial arrangements in a multi-object and multi-relationship setting. |
6 | 2021 | Discrete and continuous representations and processing in deep learning: Looking forward | Ruben Cartuyvels, Graham Spinks, Marie-Francine Moens | AI Open | A position paper that reflects on the role of discrete and continuous representations and processing in the deep learning era. |
7 | 2021 | Visual Grounding Strategies for Text-Only Natural Language Processing | Sileo, Damien | Conception, categorization and strategies to leverage multimodal pretraining for text-only tasks | |
8 | 2021 | Giving Commands to a Self-Driving Car: How to Deal with Uncertain Situations? | Deruyterre, Thierry and Milewski, Victor and Moens, Marie-Francine | When a command is given to a self-driving cars, this can cause ambiguous solutions. A method to solve this through visual and textual means is proposed. | |
9 | 2021 | Augmenting BERT-style Models with Predictive Coding to Improve Discourse-level Representations | Araujo, Vladimir and Villa, Andres and Mendoza, Marcelo and Moens, Marie-Francine and Soto, Alvaro | EMNLP 2021 | We propose to use ideas from predictive coding theory to augment BERT-style language models with a mechanism that allows them to learn suitable discourse-level representations. |
10 | 2023 | A Brief Overview of Universal Sentence Representation Methods: A Linguistic View. | Li, Ruiqi and Moens, Marie-Francine | Accepted for upcomming issue! | |
11 | 2022 | Critical Analysis of Deconfounded Pretraining to Improve Visio-Linguistic Models | Cornille, Nathan and Laenen, Katrien and Moens, Marie-Francine | We critically analyze a recent technique that uses the toolbox of causality to improve on OOD performance, elucidating to what extent it actually finds confounders, under what assumptions it performs deconfounding, and whether the reported OOD performance is actually linked to the causal tools. | |
12 | 2022 | Finding Structural Knowledge in Multimodal-BERT | Milewski, Victor and de Lhoneux, Miryam and Moens, Marie-Francine | ACL 2022 | we introduce scene trees, by mapping the linguistic dependency tree ontop of regions, to investigate if BERT learns structures over the image regions. |
13 | 2022 | Evaluation Benchmarks for Spanish Sentence Representations | Vladimir Araujo, Andrés Carvallo, Souvik Kundu, José Cañete, Marcelo Mendoza, Robert E. Mercer, Felipe Bravo-Marquez, Marie-Francine Moens, Alvaro Soto | LREC 2022 | A new benchmark for spanish sentence representations |
14 | 2023 | Learning Sentence-Level Representations with Predictive Coding | Araujo, Vladimir and Moens, Marie-Francine and Soto, Alvaro | This work explores how to improve sentence-level representations of pre-trained models by borrowing ideas from predictive coding theory | |
15 | 2023 | Layout-aware Dreamer for Embodied Visual Referring Expression Grounding | Li, Mingxiao and Wang, Zehao and Tuytelaars, Tinne and Moens, Marie-Francine | AAAI-23 | We have designed an autonomous agent called Layout-aware Dreamer (LAD) including two novel modules, the Layout Learner and the Goal Dreamer, to mimic a humans cognitive decision process |
16 | 2023 | Fine-tuned vs. Prompt-tuned Supervised Representations: Which Better Account for Brain Language Representations? | Jingyuan Sun and Marie-Francine Moens | IJCAI 2023 | Investiging various supervised method and the correlation to how brains represent language. |
17 | 2023 | Investigating Neural Fit Approaches for Sentence Embedding Model Paradigms | Helena Balabin, Antonietta Gabriella Liuzzi, Jingyuan Sun, Patrick Dupont, Rik Vandenberghe, Marie-Francine Moens | ECAI 2023 | We analyze the link (i.e., neural fit) between functional MRI data and pre-trained language models using different brain networks, neural fit approaches and sentence modeling paradigms. |
18 | 2023 | Tuning In to Neural Encoding: Linking Human Brain and Artificial Supervised Representations of Language | Jingyuan Sun, Xiaohan Zhang and Marie-Francine Moens | ECAI 2023 | Linking human brain and supervised ANN representations of the Chinese language. |
19 | 2023 | Causal Factor Disentanglement for Few-Shot Domain Adaptation in Video Prediction | Cornille, Nathan and Sun, Jinguan and Laenen, Katrien and Moens, Marie-Francine | We evaluate whether we can use Causal Factor Disentanglement to isolate parameters that model different causal mechanisms, and subsequently adapt more quickly in response to a Sparse Mechanism Shift. |