Getting machines to spot objects that are overlapping or tucked away in messy, unorganized places isn’t something they’re born with. The latest dive into this by the University of Washington, Seattle, aims big, trying to mimic how humans process what they see, but this time in service robots.

Getting robots to ace recognizing their surroundings and what’s in them, visual perception skills included, is a big deal in robotics research, especially when it comes to service robots. These guys, as defined in the ISO 8373:2021 standard, are designed to be super helpful, whether it’s for personal tasks like moving stuff around, cooking, and cleaning, or for work-related tasks like inspections, surveillance, and giving directions.

The big question always hanging around is how these robots get what’s happening around them, how they see things, and how good they are at making out objects, no matter where they are meant to work.

Humans have this knack for spotting things that aren’t fully in view, piecing together bits like shape, color, and position, and linking them to what they know from past experiences. Robots, not so much.
The tech whipped up by researchers in the US marries computational topology with machine learning to figure out the shape of hidden objects, make 3D models of them, sort them out, and then match them with a database of known objects.
Looking ahead, the advancements in this tech could shake things up, especially in the job market, with a growing need for smarter, more capable service robots in industries already feeling the pinch from not having enough hands on deck.

Visual perception in service robots: they’re missing out on “object unity” and “object constancy”

Let’s kick off by saying that to figure out what’s going on in their surroundings, machines need sensors or cameras to help them find their way around and handle the stuff they’re meant to work with.

A big topic that’s been getting a lot of attention lately in the world of machine vision is the challenge – for service robots whether in warehouses or our homes – of spotting objects that are too close to each other, overlapping, or crammed into a messy, unstructured space. This issue is highlighted by a research team from the Department of Mechanical Engineering at the University of Washington, Seattle, in their work “Persistent Homology Meets Object Unity: Object Recognition in Clutter“, featured in the latest issue (vol. 40, 2024) of the scientific journal IEEE Transactions on Robotics. They point out that the visual struggle comes down to this:

 «… machines are missing what’s known in psychology as ‘object unity’, basically the human eye’s knack for identifying parts of a scene even when not everything can be seen clearly or in order»

The thinking process here involves linking what you can see of the hidden object with the full picture of the original object stored in your memory. Even though «the memory representations of objects are only from selected viewpoints. So, humans ‘normalize’ the view of the occluded object, ‘rotating’ it to a standard orientation» the research team adds.

This human visual system skill, giving us the confidence that objects stay the same regardless of how a cluttered, crowded setting might make us see them (hidden, barely visible), is known in psychology as “object constancy.”

Let’s dive deeper to see what tricks and methods robotics needs to pick up to make up for these visual perception gaps and help machines efficiently and safely do what they’re supposed to.

Past research efforts

In the last ten years, the focus has been mainly on deep learning models for spotting objects in a scene, which did pretty well in specific tasks. However, their effectiveness tends to drop when it comes to more general applications in messy and unstructured environments [source: “A survey of modern deep learning based object detection models” – Science Direct].

In July 2022, a study from the Massachusetts Institute of Technology (MIT) threw a new idea into the mix, using a camera and two RF antennas on a robotic arm, aiming to find and pull out objects tagged and buried under piles of other items.

But even then, as the University of Washington team points out, «the challenge of coming up with an object recognition method for everyday robots – like those service bots – that keeps up its performance no matter the lighting, background, or how chaotic the environment is, hasn’t been fully cracked yet».

The “persistent homology” framework

On this note, they mention their previous study (October 2022) on robot visual perception (“Visual object recognition in indoor environments using topologically persistent features“, published in IEEE Robotics and Automation), where they tried a different tack to hit the goal of “performance invariance” in any situation, whatever goes down in the space being examined. They used something called “persistent homology“, a framework for analyzing data that changes over time, including the topological traits of a place [source: “Persistent Homology” – ScienceDirect].

Specifically, they applied it to «track the evolution of topological quirks, to pick up shape-based 2D features from object segmentation maps for recognition». And in another study – they share – persistent homology helped gather topological info «from point clouds made from depth images for recognition, tapping into 3D shape info».

So, what then? When it came to perceiving hidden (occluded) objects, both methods didn’t quite hit the mark because:

«… when objects are hidden, the 2D and 3D shapes in the corresponding segmentation maps don’t match up with those of visible objects, making recognition tricky»

Visual perception in service robots: mixing computational topology with machine learning

In the study about robot visual perception in IEEE Transactions on Robotics, to nail object recognition in messy and unknown places by machines, the team went further, calling on persistent homology again but mixing it this time with a bit of computational topology and mimicking, through machine learning techniques, the unique cognitive mechanisms humans use to process visual stimuli.

Specifically, they used the persistent homology framework to come up with a topological tool called TOPS, short for Topological features Of Point cloud Slices, acting as a “scene descriptor”.

And TOPS was key in developing a machine learning model for spotting hidden objects in unstructured settings: dubbed THOR, short for TOPS for Human-inspired Object Recognition, it echoes the cognitive skill of object unity. How’s it work, exactly?

THOR lets the robot mimic that human behavior of knowing that partially seen objects aren’t broken or incomplete. And it does this, the researchers specify, «by using the shape of objects in the scene to create a 3D representation of each. Then, it uses TOPS to sort each object into a ‘highly probable’ item class, comparing its 3D shape with a library of previously stored representations».

Check out some cool findings from the team’s tests, where they managed to get a service robot to recognize hidden objects using the THOR system in a warehouse setting [source: "Persistent Homology Meets Object Unity: Object Recognition in Clutter" - IEEE Transactions on Robotics -].
Check out some cool findings from the team’s tests, where they managed to get a service robot to recognize hidden objects using the THOR system in a warehouse setting [source: “Persistent Homology Meets Object Unity: Object Recognition in Clutter” – IEEE Transactions on Robotics –].

Training THOR wasn’t about feeding it random images of messy rooms with scattered and overlapping objects but using 3D images of individual objects. The University of Washington researchers point out:

«TOPS’s ‘descriptor’ function grabs the detailed shape of scene objects, ensuring similarity in descriptors between occluded objects and their visible counterparts. THOR then uses this similarity, cutting down the need for extensive training data that fully represent every possible occlusion scenario»

Glimpses of Futures

The visual perception model we’re chatting about, crafted to boost machines with the knack for spotting objects hidden away in busy, unstructured spaces – the creators tell us – fits any indoor service robot, whether it’s for home or work. This includes robots buzzing around in our homes, offices, shops, warehouses, or factories.

In all these places, where there’s a constant stream of people and loads of stuff everywhere, the chaos could get so intense that even we humans might overlook some items. In such jam-packed situations, having a robot that can catch sight of even the sneakily hidden objects to pick up, move, transport, or handle them becomes super handy.

To better meet this challenge, future research – the team points out – needs to look beyond just the shape of the hidden objects. It should also consider things like colour, texture, and text labels that tell us more about what an object looks like in any given space.

Aiming to foresee what’s coming and explore alternative futures, let’s dive into – with the help of the STEPS matrix – the possible impacts that improving robots’ visual perception (like the one we’re talking about) to be more like our human eyesight could have socially, technologically, economically, politically, and in terms of sustainability.

S – SOCIAL: in the future, professional service robots, present in workplaces like warehouses, capable of automatically and precisely identifying, even amidst chaos, overlapping and disordered objects on shelves, could fully autonomously perform, with more efficiency and speed, even the most complex tasks, including those related to the movement and storage of goods. As for personal service robots, such as those intended for domestic use, the increasingly accurate detection of occluded objects will equate to greater machine safety, as this – once even the hidden items are located – will allow it to move with more mastery to reach them.

T – TECHNOLOGICAL: if, as announced by the workgroup, the evolution of the service robot visual perception model described here sees the expansion of its functions, with the aim of allowing the detection of hidden objects based on different attributes than shape, including the type of material they are made of, we could foresee the future use of other methodologies and techniques in addition to computational topology and machine learning, such as ultra-sensitive sensors on a robotic hand, capable of reproducing – with the help of specifically developed artificial neural networks – even tactile sensitivity in the machine.

E – ECONOMIC: the impact, from an economic standpoint, given by the evolution of the visual abilities of service robots, will predominantly concern the world of work. And for a problem diametrically opposed to what we usually focus on, i.e., the risk of unemployment following more intense automation. Today, globally, multiple sectors suffer from a general lack of personnel. This is the case – just to give a few examples – of the catering and small construction jobs sectors, within which, in the future – as noted by the International Federation of Robotics – enhanced and high-performing service robots will be increasingly requested, acting as waiters serving dishes at tables or as painting robots. This, moreover, is already happening in our country, as well as in the rest of the world (USA and China leading the way).

P – POLITICAL: the use of robotic systems has always raised doubts and questions related to their degree of safety, especially when – as in the case of service robots – they operate in contact with people, including children (think of a home or a restaurant). And not just as simple cleaners, but also as handlers of objects and their transport. In the specific case of machines that, thanks to onboard artificial intelligence techniques, have the ability to accurately detect what and who is present in the environment, maximum attention, by manufacturers, must be paid to adopting an ethical and transparent behaviour towards users, to whom exhaustive information must be provided on the correct use of service robots, as well as on privacy and data confidentiality. Recall that, in January 2024, the final text of the AI Act – effective from 1st January 2026 – a European regulation on artificial intelligence, was approved by the EU Council and Parliament, whose fundamental principles revolve precisely around safety and transparency guarantees by AI system designers and developers.

S – SUSTAINABILITY: in a possible future scenario, the impact, from an environmental and social sustainability perspective, of service robots with increasingly human-like visual perception, could centre around the objectives set by the European Green Deal, which aims, by 2050, to steer the EU towards climate neutrality, also using digital and the most innovative technologies. Hence, in the next twenty-five years, the number of service robots employed in agriculture as “agricultural robots” could increase exponentially, supporting Europe’s “green policy”.

Written by: