Even smartest AI models don't match human visual processing: Study
Deep convolutional neural networks (DCNNs) do not see objects the way humans do -- using configural shape perception -- and that could be dangerous in real-world artificial intelligence (Al) applications, say researchers.
DCNNs are the type most commonly used to identify patterns in images and video.
"Our results explain why deep AI models fail under certain conditions and point to the need to consider tasks beyond object recognition in order to understand visual processing in the brain," said researcher James Elder from York University in Toronto.
"These deep models tend to take 'shortcuts' when solving complex recognition tasks. While these shortcuts may work in many cases, they can be dangerous in some of the real-world AI applications we are currently working on with our industry and government partners," Elder added.
For the study, published in the journal iScience, the team employed novel visual stimuli called "Frankensteins" to explore how the human brain and DCNNs process holistic, configural object properties.
"Frankensteins are simply objects that have been taken apart and put back together the wrong way around. As a result, they have all the right local features, but in the wrong places," Elder said.
The investigators found that while the human visual system is confused by Frankensteins, DCNNs are not -- revealing an insensitivity to configural object properties.
According to the researchers, modifications to training and architecture aimed at making networks more brain-like did not lead to configural processing, and none of the networks were able to predict trial-by-trial human object judgements accurately.
"We speculate that to match human configural sensitivity, networks must be trained to solve a broader range of object tasks beyond category recognition," Elder noted.