Continual Robot Learning

Continual Robot Learning using Self-Supervised Task Inference

We present a novel approach to task inference for continual, multi-task robot learning. Our approach allows a humanoid robot to infer the intended behavior behind an unlabeled and incomplete visual demonstration and execute the tasks more efficiently than the provided demonstrations.

The results show that our approach outperforms recent multi-task learning baselines, with the difference being more pronounced in the challenging continual learning setting. Our approach is also shown to generalize to unseen tasks based on a single demonstration in one-shot task generalization experiments.

Link to the paper:

Map-based Experience Replay: A Memory-Efficient Solution to Catastrophic Forgetting in Reinforcement Learning

How can we overcome the challenge of catastrophic forgetting in reinforcement learning without sacrificing memory efficiency? Our latest research proposes a novel solution: Map-based Experience Replay. By simulating state transitions and supporting state abstraction, our approach achieves a memory reduction of 40-80% compared to reinforcement learning with standard experience replay while maintaining comparable performance. Check out our paper to learn more about the cognitive architecture that underlies our approach and the implications of our findings for continual robot learning.

Link to the paper:

Link to the code on GitHub:

Behavior Self-Organization Supports Task Inference for Continual Robot Learning

We propose an unsupervised task inference approach for continual, multi-task robot learning, inspired by goal-directed imitation learning, a cognitive process by which humans can infer a task by observing a demonstration of the desired behavior. 

Our approach learns a behavior embedding space by self-organizing visual demonstrations of behaviors. Task Inference is made by finding the nearest behavior embedding to a given demonstration. The embedding is used together with the environment state as input to a multi-task policy trained with reinforcement learning to optimize performance over tasks.

Unlike prior methods, our approach makes no assumptions about task distribution or policy architecture and requires no task exploration at test time to infer tasks. Our experiments demonstrate superior generalization performance and convergence speed compared to the state of the art, with both concurrent and sequential task presentations.

Link to the paper:

Multimodal Robot Learning

Chat with the Environment: Interactive Multimodal Perception Using Large Language Models

We address the question of how to ground LLMs in multimodal sensory input. In particular, we investigate how GPT3 can be used as a high-level task planner in situations that require ambiguity resolution.

To tackle this challenge, we propose Matcha, a multimodal interactive agent augmented with LLMs, and evaluate it on the task of uncovering object latent properties. Experimental results suggest that our agent is able to perform interactive multimodal perception reasonably by taking advantage of the commonsense knowledge residing in the LLM, and easily generalize to various scenarios by virtue of its modularity and flexibility.

Link to the paper:

Link to the code on GitHub:

Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and Explorations

We show that unsupervised exploration in multimodal environments leads to fast adaptation to new tasks. This is realised through two stages: 1) self-supervised representation learning and 2) task-specific fine-tuning. In the first stage, the agent is encouraged to learn a policy that improves its crossmodal predictions using an intrinsic visual-auditory reward. In the second stage, the learned policy is fine-tuned on downstream tasks using the pretrained visual representations from the first stage.

Link to the paper:

Link to the code on GitHub:

Metacognitive Robot Learning

Improving robot dual-system motor learning with intrinsically motivated meta-control and latent-space experience imagination

We found that the learning progress of a world model that is computed locally in self-organized regions of a learned latent space provides a spatially and temporally local estimate of the reliability in model predictions. This estimate is used to arbitrate between model-based and model-free decisions and compute an adaptive prediction horizon for model predictive control and experience imagination.

Our approach improves the efficiency of learning visuomotor control in simulation and real world. Policy networks trained in simulation with our approach are shown to perform well on the physical robot using a simple simulation-to-real transfer, without fine-tuning of the policy parameters.

Check out our 2-min video summary here

Link to the paper:

Link to the code on GitHub:

Curious Meta-Controller: Adaptive Alternation between Model-Based and Model-Free Control in Deep Reinforcement Learning

In this work, we show that using a curiosity feedback based on prediction learning progress to arbitrate between model-based and model-free decisions accelerates learning pixel-level control policies. 

Link to the paper:

Intrinsically Motivated Robot Learning

J.Behav.Robot 2019.mp4

Deep Intrinsically Motivated Continuous Actor-Critic for Efficient Robotic Visuomotor Skill Learning

This work demonstrates that spatially and temporally local learning progress in a growing ensemble of local world models provides an effective intrinsic reward, enabling directed exploration for vision-based grasp learning on a developmental humanoid robot. The work also suggests that training a small actor network on low-dimensional feature representations learned for self-reconstruction and reward prediction leads to a fast and stable learning performance.

Link to the paper:

Efficient Intrinsically Motivated Robotic Grasping with Learning-Adaptive Imagination in Latent Space

Inspired by human mental simulation of motor behavior and its role in skill acquisition, we show that:

(1) The sample efficiency of learning vision-based robotic grasping can be greatly improved by performing experience imagination in a learned latent space and using the imagined data for training grasping policies. 

(2) The proposed adaptive imagination, where imagined rollouts are generated with probability proportional to the prediction reliability of the local world model in the traversed latent-space regions, outperforms fixed-depth imagination.

(3) Using intrinsic reward based on model learning progress leads to data that improves future predictions necessary for imagination.

Link to the paper: