Emergent Mind

Automating the Enterprise with Foundation Models

(2405.03710)
Published May 3, 2024 in cs.SE , cs.AI , and cs.LG

Abstract

Automating enterprise workflows could unlock $4 trillion/year in productivity gains. Despite being of interest to the data management community for decades, the ultimate vision of end-to-end workflow automation has remained elusive. Current solutions rely on process mining and robotic process automation (RPA), in which a bot is hard-coded to follow a set of predefined rules for completing a workflow. Through case studies of a hospital and large B2B enterprise, we find that the adoption of RPA has been inhibited by high set-up costs (12-18 months), unreliable execution (60% initial accuracy), and burdensome maintenance (requiring multiple FTEs). Multimodal foundation models (FMs) such as GPT-4 offer a promising new approach for end-to-end workflow automation given their generalized reasoning and planning abilities. To study these capabilities we propose ECLAIR, a system to automate enterprise workflows with minimal human supervision. We conduct initial experiments showing that multimodal FMs can address the limitations of traditional RPA with (1) near-human-level understanding of workflows (93% accuracy on a workflow understanding task) and (2) instant set-up with minimal technical barrier (based solely on a natural language description of a workflow, ECLAIR achieves end-to-end completion rates of 40%). We identify human-AI collaboration, validation, and self-improvement as open challenges, and suggest ways they can be solved with data management techniques. Code is available at: https://github.com/HazyResearch/eclair-agents

The model uses Feature Models (FMs) to learn from video demos, navigate GUIs, and audit workflows.

Overview

  • Multimodal foundation models (FMs) like GPT-4 present a new approach to automating enterprise workflows, aiming to overcome the traditional challenges of high setup costs, brittle execution, and labor-intensive maintenance experienced with Robotic Process Automation (RPA).

  • The research introduces a system named 'ECLAIR' utilizing multimodal FMs to automate tasks with minimal human intervention, demonstrating significant enhancements in learning from demonstrations, efficient execution, and self-monitoring capabilities.

  • Despite the promising advances of ECLAIR, it still faces challenges in complex decision-making and independence from human oversight, with future improvements aimed at refining its decision algorithms and training methodologies.

Understanding the Automation of Enterprise Workflows Through Multimodal Foundation Models

Introduction to Multimodal Foundation Models in Automation

In the realm of automating business workflows, traditional methods have often stumbled due to a trio of challenges: high setup costs, brittle execution, and labor-intensive maintenance. Enter the promising new approach of deploying multimodal foundation models (FMs) like GPT-4, which are geared towards reducing these hurdles significantly while enhancing the accuracy and efficiency of workflow automation.

The Core Challenges of Traditional RPA

Robotic Process Automation (RPA) has been the go-to technology for enterprise workflow automation, yet it faces significant limitations:

  • High setup costs: RPA requires detailed mapping and scripting by skilled specialists, leading to prolonged and costly setup phases.
  • Brittle execution: Limited by rigid rule-based programming, RPA systems struggle with minor variations in input or interfaces, contributing to low initial accuracy and demanding continuous adjustments.
  • Burdensome maintenance: Continuous human supervision is necessary to manage and correct RPA bots, negating some of the intended efficiency gains.

Advantages of Multimodal Foundation Models

The adoption of multimodal FMs can potentially revolutionize this space. These models showcase an inherent ability to understand and navigate graphical user interfaces (GUIs), plan sequences of actions, and adapt to new workflows with minimal human intervention. The research unveils a system dubbed "ECLAIR" that leverages such models. Key capabilities highlighted are:

  • Learning from demonstrations: ECLAIR achieves impressive results in understanding workflows by observing demonstrations, showing 93% accuracy in recognizing workflow steps from video key frames.
  • Efficient execution: While initializing from just a natural language description, ECLAIR can effectively plan and suggest necessary actions, improving task completion rates significantly from baseline models.
  • Self-monitoring and validation: The capability to self-validate its actions enables ECLAIR to operate with reduced human oversight, achieving high precision and recall in identifying correctly completed tasks.

Practical Implications and Future Prospects

The development of ECLAIR hints at a future where enterprise workflows are not only automated more comprehensively but also with greater adaptability and lower overheads. This could translate to substantial productivity boosts and cost reductions in industries reliant on complex digital workflows.

Shortcomings and Development Path

Despite the promising advancements, ECLAIR and similar systems need further refinement to completely replace traditional RPA:

  • Complex decision-making: The system currently struggles with tasks requiring intricate decision-making or those with very dynamic GUI elements.
  • Full independence from human oversight: While ECLAIR reduces the need for human intervention, certain tasks still necessitate manual handling, especially in sensitive areas needing precise verification.

Impending Enhancements

Future improvements could focus on enhancing the decision algorithms to handle more nuanced tasks and employing more advanced training techniques that allow the models to learn from a broader range of demonstrations with even less specificity required in the training data.

Conclusion

As multimodal FMs continue to evolve, the potential to automate a broader spectrum of workflows at reduced costs and increased reliabilities looms on the horizon. This could mark a pivotal turn in how companies approach process automation, potentially transforming the landscape of enterprise operations technology.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.