How do we know AI is ready to be in the wild? Maybe a critic is needed
How do we know AI is ready to be in the wild? Maybe a critic is needed
Mischief can happen when AI is let loose in the world, just like any technology. The examples of AI gone wrong are numerous, the most vivid in recent memory being the disastrously bad performance of Amazon’s facial recognition technology, Rekognition, which had a propensity to erroneously match members of some ethnic groups with criminal mugshots to a disproportionate extent.
Given the risk, how can society know if a technology has been adequately refined to a level where it is safe to deploy?
“This is a really good question, and one we are actively working on, “Sergey Levine, assistant professor with the University of California at Berkeley’s department of electrical engineering and computer science, told ZDNet by email this week.
Levine and colleagues have been working on an approach to machine learning where the decisions of a software program are subjected to a critique by another algorithm within the same program that acts adversarially. The approach is known as conservative Q-Learning, and it was described in a paper posted on the arXiv preprint server last month.
ZDNet reached out to Levine this week after he posted an essay on Medium describing the problem of how to safely train AI systems to make real-world decisions.
Levine has spent years at Berkeley’s robotic artificial intelligence and learning lab developing AI software that to direct how a robotic arm moves within carefully designed experiments — carefully designed because you don’t want something to get out of control when a robotic arm can do actual, physical damage.
Robotics often relies on a form of machine learning called reinforcement learning. Reinforcement learning algorithms are trained by testing the effect of decisions and continually revising a policy of action depending on how well the action affects the state of affairs.
But there’s the danger: Do you want a self-driving car to be learning on the road, in real traffic?
In his Medium post, Levine proposes developing “offline” versions of RL. In the offline world, RL could be trained using vast amounts of data, like any conventional supervised learning AI system, to refine the system before it is ever sent out into the world to make decisions.
Also: A Berkeley mash-up of AI approaches promises continuous learning
“An autonomous vehicle could be trained on millions of videos depicting real-world driving,” he writes. “An HVAC controller could be trained using logged data from every single building in which that HVAC system was ever deployed.”

To boost the value of reinforcement learning, Levine proposes moving from the strictly “online” scenario, exemplified by the diagram on the right, to an “offline” period of training, whereby algorithms are input with masses of labeled data more like traditional supervised machine learning.
(Image: Sergey Levine)
Levine uses the analogy of childhood development. Children receive many more signals from the environment than just the immediate results of actions.
“In the first few years of your life, your brain processed a broad array of sights, sounds, smells, and motor commands that rival the size and diversity of the largest datasets used in machine learning,” Levine writes.
Which comes back to the original question, to wit, after all that offline development, how does one know when an RL program is sufficiently refined to go “online,” to be used in the real world?
That’s where conservative Q-learning comes in. Conservative Q-learning builds on the widely studied Q-learning, which is itself a form of reinforcement learning. The idea is to “provide theoretical guarantees on the performance of policies learned via offline RL,” Levine explained to ZDNet. Those guarantees will block the RL system from carrying out bad decisions.
Imagine you had a long, long history kept in persistent memory of what actions are good actions that prevent chaos. And imagine your AI algorithm had to develop decisions that didn’t violate that long collective memory.

“This seems like a promising path for us toward methods with safety and reliability guarantees in offline RL,” says UC Berkeley assistant professor Sergey Levine, of the work he and colleagues are doing with “conservative Q-learning.”
Sergey Levine
In a typical RL system, a value function is computed based on how much a certain choice of action will contribute to reaching a goal. That informs a policy of actions.
In the conservative version, the value function places a higher value on that past data in persistent memory about what should be done. In technical terms, everything a policy wants to do is discounted, so that there’s an extra burden of proof to say that the policy has achieved its optimal state.
A struggle ensues, Levine told ZDNet, making an analogy to generative adversarial networks, or GANs, a type of machine learning.
“The value function (critic) ‘fights’ the policy (actor), trying to assign the actor low values, but assign the data high values.” The interplay of the two functions makes the critic better and better at vetoing bad choices. “The actor tries to maximize the critic,” is how Levine puts it.
Through the struggle, a consensus emerges within the program. “The result is that the actor only does those things for which the critic ‘can’t deny’ that they are good (because there is too much data that supports the goodness of those actions).”
Also: MIT finally gives a name to the sum of all AI fears
There are still some major areas that need refinement, Levine told ZDNet. The program at the moment has some hyperparameters that have to be designed by hand rather than being arrived at from the data, he noted.
“But so far this seems like a promising path for us toward methods with safety and reliability guarantees in offline RL,” said Levine.
In fact, conservative Q-learning suggests there are ways to incorporate practical considerations into the design of AI from the start, rather than waiting till after such systems are built and deployed.
Also: To Catch a Fake: Machine learning sniffs out its own machine-written propaganda
The fact that it is Levine carrying out this inquiry should give the approach of conservative Q-learning added significance. With a firm grounding in real-world applications of robotics, Levine and his team are in a position to validate the actor-critic in direct experiments.
Indeed, the conservative Q-Learning paper, which is lead-authored by Aviral Kumar of Berkeley, and was done with the collaboration of Google Brain, contains numerous examples of robotics tests in which the approach showed improvements over other kinds of offline RL.
There is also a blog post authored by Google if you want to learn more about the effort.
Of course, any system that relies on amassed data offline for its development will be relying on the integrity of that data. A successful critique of the kind Levine envisions will necessarily involve broader questions about where that data comes from, and what parts of it represent good decisions.
Some aspects of what is good and bad may be a discussion society has to have that cannot be automated.
Published at Fri, 18 Sep 2020 20:03:45 +0000
UW Researchers Hone Computer Models to Identify Animals in Photos
September 18, 2020
This camera-trap image of a mountain lion was among those used to train computer models to identify animals with a high degree of accuracy and efficiency. (Jim Beasley Photo)
University of Wyoming researchers once again have advanced artificial intelligence technology to identify images of wild animals from camera-trap photographs in North America.
The researchers used 3 million camera-trap images from 18 studies in 10 U.S. states to develop two computer models that demonstrated remarkable accuracy and efficiency in accomplishing a task that is important in wildlife research.
The project, detailed in an article in the journal Ecology and Evolution, makes the technology more accessible to biologists who do not have advanced computational skills. It builds on previous research that shows the artificial intelligence technique called deep learning can take the place of slow, tedious analysis of individual photos by people.
“Training a model that can be used to classify species in multiple environments is tricky,” says Mikey Tabak, an adjunct faculty member in UW’s Department of Zoology and Physiology, and a recent UW Ph.D. graduate. “But we found that training modules with many species from multiple locations improved our ability to classify species in new environments.”
Tabak is the lead author of the paper, which also received contributions from recent UW computer science Ph.D. graduate Mohammad Sadegh (Arash) Norouzzadeh and former UW Department of Computer Science faculty member Jeff Clune.
The models were developed using camera-trap images from California, Colorado, Florida, Idaho, Minnesota, Montana, South Carolina, Texas, Washington and Wisconsin. The first one, called the “species model,” recognizes 58 species, ranging from snowshoe hares to grizzly bears. The second, called the “empty-animal model,” effectively filters out images that do not contain animals — an important function in analyzing large numbers of camera-trap photos.
Both models were 97 percent accurate when tested with images from the areas from which the software was developed. When tested with images from other parts of the world, the empty-animal model’s accuracy ranged from 90 percent-94 percent. The species model was less accurate, ranging from 65 percent-93 percent.
“The poor performance of the species model in some areas indicates that some users will need to train new models on images from their field sites,” Tabak says. “But the empty-animal model appears to be broadly applicable for sorting out empty images in datasets globally. By first removing the empty images, users can focus on those with animals and more easily label a large enough dataset to train models.”
The researchers have made the new models freely available in a software package in Program R. The package allows other users to classify their images containing the 58 species in the dataset, but it also allows users to train their own machine-learning models using images from new datasets in a point-and-click interface through a package called Shiny.
“Our R Shiny apps allow ecologists with minimal programming experience to train deep learning models specific to their own field sites, which will hopefully save time and money in many wildlife camera-trap projects,” Tabak says.
Published at Fri, 18 Sep 2020 16:52:30 +0000
