AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
- Qingyun Wu ,
- Gagan Bansal ,
- Jieyu Zhang ,
- Yiran Wu ,
- Beibin Li ,
- Erkang (Eric) Zhu ,
- Li Jiang ,
- Xiaoyun Zhang ,
- Shaokun Zhang ,
- Ahmed Awadallah ,
- Ryen W. White ,
- Doug Burger ,
- Chi Wang
Best Paper, LLM Agents Workshop ICLR'24
Download BibTexWe present AutoGen, an open-source framework that allows developers to build LLM applications by composing multiple agents to converse with each other to accomplish tasks. AutoGen agents are customizable, conversable, and can operate in various modes that employ combinations of LLMs, human inputs, and tools. It also enables developers to create flexible agent behaviors and conversation patterns for different applications using both natural language and code. AutoGen serves as a generic infrastructure and is widely used by AI practitioners and researchers to build diverse applications of various complexities and LLM capacities. We demonstrate the framework’s effectiveness with several pilot applications, on domains ranging from mathematics and coding to question-answering, supply-chain optimization, online decision-making, and entertainment.
What’s new in AutoGen?
Presented by Chi Wang at Microsoft Research Forum, Episode 2
Chi Wang discussed the latest updates on AutoGen – the multi-agent framework for next generation AI applications. This includes milestones achieved, community feedback, new exciting features, and ongoing research and challenges.
Transcript
What’s new in AutoGen?
CHI WANG: Hi, everyone. My name is Chi. I’m from Microsoft Research AI Frontiers. I’m excited to share with you the latest news about AutoGen. AutoGen was motivated by two big questions: what are the future AI applications like, and how do we empower every developer to build them? Last year, I worked with my colleagues and collaborators from Penn State University and University of Washington on a new multi-agent framework.
We have been building AutoGen as a programing framework for agent AI, like PyTorch for deep learning. We developed AutoGen inside an open-source project, FLAML, and in last October, we moved it to a standalone repo on GitHub. Since then, we’ve got new feedback from users every day, everywhere. Users have shown really high recognition of the power of AutoGen, and they have deep understanding of the values in different dimensions like flexibility, modularity, simplicity.
Let’s check one example use case.
[Beginning of pre-recorded testimonial.]
Sam Khalil, VP, Data Insights & FounData, Novo Nordisk: In our data science department, AutoGen is helping us develop a production ready multi-agent framework.
Rasmus Sten Andersen, AI engineer lead, Novo Nordisk: Our first target is to reduce the barriers to technical data analytics and to enable our broader community to derive insights.
Georgios Ilias Kavousanos, data engineer, AI Labs, Novo Nordisk: We are also extending AutoGen with the strict requirements from our industry given the sensitive nature of our data.
[End of pre-recorded testimonial.]
WANG: That is one example use case from the pharmacy vertical. We have seen big enterprise customers’ interest like this from pretty much every industry vertical. AutoGen is used or contributed [to] by companies, organizations, universities from A to Z, all over the world. We have seen hundreds of example applications, and some organizations use AutoGen as a backbone to build their own agent platform, and others use AutoGen for diverse scenarios, including research and investment to novel and creative applications of multiple agents. AutoGen has a large community—very active—of developers, researchers, AI practitioners. They are so active and passionate. I’m so amazed by that, and I appreciate all the recognition that was received by AutoGen in such a short amount of time. For example, we have been selected, our paper is selected by TheSequence as one of the top favorite AI papers in 2023. To quickly share our latest news, last Friday, our initial multi-agent experiment on the challenging GAIA benchmark turned out to achieve the No. 1 accuracy in the leaderboard in all the three levels. That shows the power of AutoGen in solving complex tasks and the bigger potential.
This is one example of our effort in answering a few open hard questions, such as how to design an optimal multi-agent workflow. AutoGen is under active research and development and is evolving at a very fast pace. Here are examples of our exciting new features or ongoing research. First, for evaluation, we are making agent-based evaluation tools or benchmarking tools. Second, we are making rapid progress in further improving the interface to make it even easier to build agent applications. Third, the learning capability allows agents to remember teachings from users or other agents long term and improve over time. And fourth, AutoGen is integrated with new technologies like OpenAI assistant and multimodality. Please check our blog post from the website to understand more details.
I appreciate the huge amount of support from everyone in the community, and we need more help in solving all the challenging problems. You’re all welcome to join the community and define the future of AI agents together.
Thank you very much.
AutoGen Update: Complex Tasks and Agents
Presented by Adam Fourney at Microsoft Research Forum, Episode 3
Adam Fourney discusses the effectiveness of using multiple agents, working together, to complete complex multi-step tasks. He will showcase their capability to outperform previous single-agent solutions on benchmarks like GAIA, utilizing customizable arrangements of agents that collaborate, reason, and utilize tools to achieve complex outcomes.
Transcript
AutoGen Update: Complex Tasks and Agents
ADAM FOURNEY: Hello, my name is Adam Fourney, and today, I’ll be presenting our work on completing complex tasks with agents. And though I’m presenting, I’m sharing the contributions of many individuals as listed below. All right, so let’s just dive in.
So in this presentation, I’ll share our goal, which is to reliably accomplish long-running complex tasks using large foundational models. I’ll explain the bet that we’re taking on using multi-agent workflows as the platform or the vehicle to get us there, and I’ll share a little bit about our progress in using a four-agent workflow to achieve state-of-the-art performance on a recent benchmark.
So what exactly is a complex task? Well, if we take a look at the following example from the GAIA benchmark for General AI Assistants, it reads, “How many nonindigenous crocodiles were found in Florida from the years 2000 through 2020?” Well, to solve this task, we might begin by performing a search and discovering that the U.S. Geological Survey maintains an online database for nonindigenous aquatic species. If we access that resource, we can form an appropriate query, and we’ll get back results for two separate species. If we open the collection reports for each of those species, we’ll find that in one instance, five crocodiles were encountered, and in the other, just a single crocodile was encountered, giving a total of six separate encounters during those years. So this is an example of a complex task, and it has certain characteristics of tasks of this nature, which is that it benefits strongly from planning, acting, observing, and reflecting over multiple steps, where those steps are doing more than just generating tokens. Maybe they’re executing code. Maybe they’re using tools or interacting with the environment. And the observations they’re doing … they’re adding information that was previously unavailable. So these are the types of tasks that we’re interested in here. And as I mentioned before, we’re betting on using multi-agent workflows as the vehicle to get us there.
So why multi-agents? Well, first of all, the whole setup feels very agentic from, sort of, a first-principles point of view. The agents are reasoning, they’re acting, and then they’re observing the outcomes of their actions. So this is very natural. But more generally, agents are a very, very powerful abstraction over things like task decomposition, specialization, tool use, etc. Really, you think about which roles you need on your team, and you put together your team of agents, and you get them to talk to one another, and then you start making progress on your task. So to do all this, to build all this, we are producing a platform called AutoGen (opens in new tab), which is open source and available on GitHub. And I encourage you to check this out at the link below.
All right, so now let’s talk about the progress we’ve been making using this approach. So if you recall that question about crocodiles from the beginning, that’s from the GAIA benchmark for General AI Assistants. And we put together four agents to work on these types of problems. It consists of a general assistant, a computer terminal that can run code or execute programs, a web server that can browse the internet, and an orchestrator to, sort of, organize and oversee their work. Now with that team of four agents, we were actually able to, in March, achieve the top results on the GAIA leaderboard for that benchmark by about 8 points. But what’s perhaps more exciting to us is that we are able to more than double the performance on the hardest set of questions, the Level 3 questions, which the authors of that work describe as questions for a perfect general assistant, requiring to take arbitrarily long sequences of actions, use any number of tools, and to access the world in general. So this is all very exciting, and I want to share a little bit more about what those agents are actually doing.
So this is the loop or the plan that they are following. So it begins with the question or the prompt, and then we produce a ledger, which is like a working memory that consists of given or verified facts; facts that we need to look up, for example, on the internet; facts that we need to derive, perhaps through computation; and educated guesses. Now these educated guesses turn out to be really important because they give the language models space to speculate in a constrained environment without some of the downstream negative effects of hallucination. So once we have that ledger, we assign the tasks to the independent agents, and then we go into this inner loop, where we ask first, are we done? If not, well, are we still making progress? As long as we’re making progress, we’ll go ahead and we’ll delegate the next step to the next agent. But if we’re not making progress, we’ll note that down. We might still delegate one other step, but if that stall occurs for three rounds, then we will actually go back, update the ledger, come up with a new set of assignments for the agents, and then start over.
All right, so this is the configuration that’s been working well for us, and it’s all I have time to share with you today. But I mentioned our goal, our bet, and our progress, and I want to conclude by sharing our plans for the future. So already we’re starting to tackle increasingly more complex benchmarks and real-world scenarios with this configuration. And we’re really excited about opportunities to introduce new agents that, for example, learn and self-improve with experience; that understand images and screenshots a little better for maybe more effective web surfing or use of interfaces; and that are maybe a bit more systematic about exploring that solution space. So rather than just updating that ledger and then restarting when they get stuck, they can be a bit more pragmatic about the strategies that they’re employing.
All right, well, thank you for your attention, and thank you for attending the Microsoft Research Forum, and we look forward to you joining us next time.

AutoGen v0.4: Reimagining the foundation of agentic AI for scale and more
Presented by Gagan Bansal at Microsoft Research Forum, Episode 5
Gagan Bansal, Senior Researcher, Microsoft Research AI Frontiers introduces a transformative update to the AutoGen framework that builds on user feedback and redefines modularity, stability, and flexibility to empower the next generation of agentic AI research and applications.
Explore more
AutoGen v0.4: Reimagining the foundation of agentic AI for scale, extensibility, and robustness
Microsoft Research Blog | January 2025
AutoGen (opens in new tab)
Code on GitHub
Migration Guide for v0.2 to v0.4 (opens in new tab)
AutoGen project
Transcript
AutoGen v0.4: Reimagining the foundation of agentic AI for scale, extensibility, and robustness
FRIEDERIKE NIEDTNER, Principal Technical Research Program Manager, Microsoft Research AI Frontiers: The following talk invites us to follow the journey of AutoGen (opens in new tab) from a leading open-source framework for multi-agent applications to a complete redesign that lays the foundation for the future of agentic AI research and applications with the release of AutoGen 0.4 (opens in new tab). The framework’s new layered architecture provides flexibility and scalability and includes an ecosystem of extensions and applications, some created by the same team, such as Magentic-One, a team of generalist agents, and Studio, a low-code developer tool. AutoGen 0.4 is also a story about collaboration between MSR, partners within Microsoft, and a vibrant open-source community.
GAGAN BASAL: Hi, I am Gagan Bansal and I am a researcher at Microsoft Research AI Frontiers. And today I’ll talk about some exciting technical updates to AutoGen, a leading open-source framework for agentic AI. And although I am presenting, this is joint work with many incredible colleagues and interns at Microsoft over the last year.
AutoGen is a leading open-source framework for multi-agent applications that we released in fall 2023. It enables developers and researchers to create intelligent applications using large language models, tool use, and multi-agent collaboration patterns. With AutoGen, our goal has been to lead the innovation in agentic AI research. When we first launched AutoGen in Fall 2023, it quickly became the leading open-source framework for agentic AI, and it continues to empower developers and researchers in many, many domains, including business process automation, marketing, finance, security, and others.
Since AutoGen’s launch, we’ve not just been maintaining it. We’ve been listening closely to feedback from developers and researchers, and in this rapidly evolving landscape of AI progress, their expectations were high. Users told us that they needed greater modularity and the ability to reuse agents seamlessly. They also asked for better support for debugging and scaling their agentic solutions. And finally, there were many apps to enhance the code quality and maturity of the platform.
Pursuing these needs required us to question our assumptions and even possibly reimagine the platform. So, in early 2024, we used these learnings to experiment with alternate architectures, and we ended up adopting an actor model for multi-agent orchestration. The actor model is a well-known programming model for concurrent programing and high use systems. Here, actors are the computational building blocks that can exchange messages and also perform work. In Fall 2024, we announced a preview of this version and this new year, we’re thrilled to announce a full release. In summary, AutoGen v0.4 is our response to address our users’ feedback in this evolving landscape of AI research. AutoGen is now not just a framework, but it’s a whole ecosystem for agentic AI. It provides you with a framework that lets you build sophisticated agents and multi-agent applications, and it also provides you with developer tools and many well-defined applications.
Let me first tell you about the AutoGen framework. At the heart of this release is a layered architecture that is designed for flexibility and scalability. At the base is AutoGen Core. This layer implements the actor model for agents. Building on core is AutoGen AgentChat. This layer provides a simple and easy to use API that is perfect for rapid prototyping. And building on Core and AgentChat is Extensions.
This layer provides advanced clients, agents and teams, and integrations with third party software. This layered architecture is nice because whether you are an advanced developer or a researcher prototyping new ideas, AutoGen provides you with the tools you need for your project’s stage of development. The Core implements an actor model for agentic AI. At the highest level, this implementation provides two key features.
The first is asynchronous message exchange between agents. It does so by providing a runtime, and then it also provides event-driven agents that perform computations in response to these messages. There are several implications of this design, and one of them is that it decouples how the messages are delivered between the agents from how the agents handle them. This naturally improves the modularity and scalability of agentic workflows built with AutoGen, especially for deployment.
The Core’s event-driven architecture provides several other benefits. For example, it provides affordances to observe and control agent behavior, which is crucial for responsible development of agentic technology. It also enables running multiple agents on different processes and even implementing them using different languages. Finally, it enables developers to implement a large class of multi-agent patterns, including static and dynamic workflows.
When we released AutoGen, one of the first things that the developers absolutely loved about it was its simplicity and the many pre-built agents and teams that it provided, such as the user proxy agent and the assistant agent, and the group chat between multiple agents. With the AutoGen AgentChat layer, we are maintaining these features and adding tons of more essential features such as streaming support, serialization, state management and memory for agents, and finally full-time support for a better development experience.
Please check out the link below for the migration guide. Finally, the Extension layers provide advanced runtimes, tools, clients, and ecosystem integrations that continuously expand the framework’s capabilities. In addition to the framework, this new release also provides upgrades to essential developer tools and applications built using AutoGen. And here I’ll briefly mention two of them. In late 2023, we also released AutoGen Studio, which is a low code tool for authoring multi-agent applications.
And we are excited to announce that with version 0.4, Studio has received massive upgrades. It now supports a drag and drop, multi-agent builder. It supports real time updates as agents solve tasks, flow visualizations and execution controls, so that the users remain in control, and component galleries so that the community can discover and build on each other’s work. We’ve always believed that the framework should enable state-of-the-art applications for solving complex tasks with agents, which is why we’ve been building applications with the framework ourselves and using that to guide the framework’s development.
Last year, we released Magentic-One, a state-of-the-art multi-agent team for solving file- and web-related tasks built using AutoGen. And now its developer API, and general capabilities, such as sophisticated orchestrators and specialized agents such as the web server and the file server, are now available in the AutoGen ecosystem. For us, this new ecosystem is only the beginning and sets the stage for future innovation in agentic AI.
Over the past two years, our team has made early progress in AI agents and we continue to deeply think about the changing landscape of current AI research and continue to invest in taking steps to help lead the innovation on agents. And by the way, we’re also working closely with our colleagues at Semantic Kernel, to provide an enterprise ready multi-agent runtime for AutoGen.
Thank you for attending Microsoft Research Forum. Please check out these links to learn more about AutoGen.
