Chatbot Teamwork Makes the AI Dream Work

Turning to a friend or coworker can make tricky problems easier to tackle. Now it looks like having AI chatbots team up with each other can make them more effective.

I’ve been playing this week with AutoGen, an open source software framework for AI agent collaboration developed by researchers at Microsoft and academics at Pennsylvania State University, the University of Washington, and Xidian University in China. The software taps OpenAI’s large language model GPT-4 to let you create multiple AI agents with different personas, roles, and objectives that can be prompted to solve specific problems.

To put the idea of AI collaboration to the test, I had two AI agents work together on a plan for how to write about AI collaboration.

By modifying AutoGen’s code I created a “reporter” and “editor” that discussed writing about AI agent collaboration. After talking about the importance of “showcasing how industries such as health care, transportation, retail, and more are using multi-agent AI,” the pair agreed that the proposed piece should dive into the “ethical dilemmas” posed by the technology.

It’s too early to write much about any of those suggested topics—the concept of multi-agent AI collaboration is mostly at the research phase. But the experiment demonstrated a strategy that can amplify the power of AI chatbots.

The large language models like those behind ChatGPT often stumble over math problems because they work by providing statistically plausible text rather than rigorous logical reasoning. In a paper presented at an academic workshop in May, the researchers behind AutoGen show that having AI agents collaborate can mitigate that weakness.

They found that two to four agents working together could solve fifth-grade math problems more reliably than one agent on its own. In their tests, teams were also able to reason out chess problems by talking them through, and they were able to analyze and refine computer code by talking to one another.

Others have shown similar benefits when several different AI models—even those offered by corporate rivals—team up. In a project presented at the same workshop at a major AI conference called ICLR, a group from MIT and Google got OpenAI’s ChatGPT and Google’s Bard to work together by discussing and debating problems. They found that the duo were more likely to converge on a correct solution to problems together than when the bots worked solo. Another recent paper from researchers at UC Berkeley and the University of Michigan showed that having one AI agent review and critique the work of another could allow the supervising bot to upgrade the other agent’s code, improving its ability to use a computer’s web browser.

Teams of LLMs can also be prompted to behave in surprisingly humanlike ways. A group from Google, Zhejiang University in China, and the National University of Singapore, found that assigning AI agents distinct personality traits, such as “easy-going” or “overconfident,” can fine-tune their collaborative performance, either positively or negatively.

And a recent article in The Economist rounds up several multi-agent projects, including one commissioned by the Pentagon’s Defense Advanced Research Projects Agency. In that experiment, a team of AI agents was tasked with searching for bombs hidden within a labyrinth of virtual rooms. While the multi-AI team was better at finding the imaginary bombs than a lone agent, the researchers also found that the group spontaneously developed an internal hierarchy. One agent ended up bossing the others around as they went about their mission.

Graham Neubig, an associate professor at Carnegie Mellon University, who organized the ICRL workshop, is experimenting with multi-agent collaboration for coding. He says that the collaborative approach can be powerful but also can lead to new kinds of errors, because it adds more complexity. “It’s possible that multi-agent systems are the way to go, but it’s not a foregone conclusion,” Neubig says.

People are already adapting the open source AutoGen framework in interesting ways, for instance creating simulated writers’ rooms to generate fiction ideas, and a virtual “business-in-a-box” with agents that take on different corporate roles. Perhaps it won’t be too long until the assignment my AI agents came up with needs to be written.