Automating the Enterprise with Foundation Models (2405.03710v1)
Abstract: Automating enterprise workflows could unlock $4 trillion/year in productivity gains. Despite being of interest to the data management community for decades, the ultimate vision of end-to-end workflow automation has remained elusive. Current solutions rely on process mining and robotic process automation (RPA), in which a bot is hard-coded to follow a set of predefined rules for completing a workflow. Through case studies of a hospital and large B2B enterprise, we find that the adoption of RPA has been inhibited by high set-up costs (12-18 months), unreliable execution (60% initial accuracy), and burdensome maintenance (requiring multiple FTEs). Multimodal foundation models (FMs) such as GPT-4 offer a promising new approach for end-to-end workflow automation given their generalized reasoning and planning abilities. To study these capabilities we propose ECLAIR, a system to automate enterprise workflows with minimal human supervision. We conduct initial experiments showing that multimodal FMs can address the limitations of traditional RPA with (1) near-human-level understanding of workflows (93% accuracy on a workflow understanding task) and (2) instant set-up with minimal technical barrier (based solely on a natural language description of a workflow, ECLAIR achieves end-to-end completion rates of 40%). We identify human-AI collaboration, validation, and self-improvement as open challenges, and suggest ways they can be solved with data management techniques. Code is available at: https://github.com/HazyResearch/eclair-agents
- Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691 (2022).
- Automation Anywhere. 2020. https://www.automationanywhere.com/company/press-room/global-research-reveals-worlds-most-hated-office-tasks
- The Unsolved Challenges of LLMs as Generalist Web Agents: A Case Study. In NeurIPS 2023 Foundation Models for Decision Making Workshop.
- Automated discovery of process models from event logs: Review and benchmark. IEEE transactions on knowledge and data engineering 31, 4 (2018), 686–705.
- David Autor. 2014. Polanyi’s paradox and the shape of employment growth. Technical Report. National Bureau of Economic Research.
- Maintaining database integrity with refinement types. In European Conference on Object-Oriented Programming. Springer, 484–509.
- Introducing our Multimodal Models. https://www.adept.ai/blog/fuyu-8b
- Matthew Bayley and Ed Levine. 2013. Hospital revenue cycle operations: opportunities created by the ACA. Management (2013).
- Querying with access patterns and integrity constraints. Proceedings of the VLDB Endowment 8, 6 (2015), 690–701.
- Amanda Bergson-Shilcock and Roderick Taylor. 2023. Closing the Digital” Skill” Divide: The Payoff for Workers, Business, and the Economy. National Skills Coalition (2023).
- Alessandro Berti and Mahnaz Sadat Qafari. 2023. Leveraging Large Language Models (LLMs) for Process Mining (Technical Report). arXiv preprint arXiv:2307.12701 (2023).
- Collaborative data analytics with DataHub. In Proceedings of the VLDB Endowment International Conference on Very Large Data Bases, Vol. 8. NIH Public Access, 1916.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021).
- Generative AI at work. Technical Report. National Bureau of Economic Research.
- Fabio Casati and Ming-Chien Shan. 2000. Process automation as the foundation for e-business. In VLDB. Citeseer, 688–691.
- From Robotic Process Automation to Intelligent Process Automation: –Emerging Trends–. In Business Process Management: Blockchain and Robotic Process Automation Forum: BPM 2020 Blockchain and RPA Forum, Seville, Spain, September 13–18, 2020, Proceedings 18. Springer, 215–228.
- The economic potential of generative AI The next productivity frontier The economic potential of generative AI: The next productivity frontier.
- Intelligent methods for business rule processing: State-of-the-art. arXiv preprint arXiv:2311.11775 (2023).
- Laila Dahabiyeh and Omar Mowafi. 2023. Challenges of using RPA in auditing: A socio-technical systems approach. Intelligent Systems in Accounting, Finance and Management (2023).
- Mind2Web: Towards a Generalist Agent for the Web. arXiv:2306.06070 [cs.CL]
- Towards a unified agent with foundation models. arXiv preprint arXiv:2307.09668 (2023).
- AI-augmented business process management systems: a research manifesto. ACM Transactions on Management Information Systems 14, 1 (2023), 1–19.
- How well can large language models explain business processes? arXiv preprint arXiv:2401.12846 (2024).
- Dahlia Fernandez and Aini Aman. 2021. The challenges of implementing robotic process automation in global business services. International Journal of Business and Society 22, 3 (2021), 1269–1282.
- Drive like a human: Rethinking autonomous driving with large language models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 910–919.
- Multimodal Web Navigation with Instruction-Finetuned Foundation Models. arXiv preprint arXiv:2305.11854 (2023).
- An overview of workflow management: From process modeling to workflow automation infrastructure. Distributed and parallel Databases 3 (1995), 119–153.
- Large Language Models can accomplish Business Process Management Tasks. In International Conference on Business Process Management. Springer, 453–465.
- A real-world webagent with planning, long context understanding, and program synthesis. arXiv preprint arXiv:2307.12856 (2023).
- WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models. arXiv:2401.13919 [cs.CL]
- From revenue cycle management to revenue excellence.
- Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352 (2023).
- CogAgent: A Visual Language Model for GUI Agents. arXiv preprint arXiv:2312.08914 (2023).
- Data management perspectives on business process management: tutorial overview. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 943–948.
- A data-driven approach for learning to control computers. In International Conference on Machine Learning. PMLR, 9466–9482.
- Robotic process automation: systematic literature review. In Business Process Management: Blockchain and Central and Eastern Europe Forum: BPM 2019 Blockchain and CEE Forum, Vienna, Austria, September 1–6, 2019, Proceedings 17. Springer, 280–295.
- ADEPT: An agent-based approach to business process management. ACM Sigmod Record 27, 4 (1998), 32–39.
- How can we know what language models know? Transactions of the Association for Computational Linguistics 8 (2020), 423–438.
- CHORUS: Foundation Models for Unified Data Discovery and Exploration. arXiv preprint arXiv:2306.09610 (2023).
- Victor Kilanko. 2023. Leveraging Artificial Intelligence for Enhanced Revenue Cycle Management in the United States. International Journal of Scientific Advances 4, 4 (2023), 505–14.
- Robotic process mining: vision and challenges. Business & Information Systems Engineering 63 (2021), 301–314.
- Xavier Lhuer. 2016. The next acronym you need to know about: RPA (robotic process automation). (2016).
- More agents is all you need. arXiv preprint arXiv:2402.05120 (2024).
- Interactive task and concept learning from natural language instructions and gui demonstrations. arXiv preprint arXiv:1909.00031 (2019).
- Encouraging divergent thinking in large language models through multi-agent debate. arXiv preprint arXiv:2305.19118 (2023).
- Demonstration of collaborative and interactive workflow-based data analytics in texera. Proceedings of the VLDB Endowment 15, 12 (2022), 3738–3741.
- Bolaa: Benchmarking and orchestrating llm-augmented autonomous agents. arXiv preprint arXiv:2308.05960 (2023).
- Query-based workload forecasting for self-driving database management systems. In Proceedings of the 2018 International Conference on Management of Data. 631–645.
- Interrupt Handling Schemes in Operating Systems. Springer.
- Process automation using RPA–a literature review. Procedia Computer Science 219 (2023), 244–254.
- Towards large language model-based personal agents in the enterprise: Current trends and open problems. In Findings of the Association for Computational Linguistics: EMNLP 2023. 6909–6921.
- Can Foundation Models Wrangle Your Data? Proceedings of the VLDB Endowment 16, 4 (2022), 738–746.
- R OpenAI. 2023. GPT-4 technical report. arXiv (2023), 2303–08774.
- Training language models to follow instructions with human feedback. Advances in neural information processing systems 35 (2022), 27730–27744.
- Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 1–22.
- Self-Driving Database Management Systems.. In CIDR, Vol. 4. 1.
- Make your database system dream of electric sheep: towards self-driving operation. Proceedings of the VLDB Endowment 14, 12 (2021), 3211–3221.
- Prototyping and implementing Robotic Process Automation in accounting firms: Benefits, challenges and opportunities to audit automation. International Journal of Accounting Information Systems 51 (2023), 100641.
- R1. 2022. Healthcare Financial Trends Report. https://www.r1rcm.com/news/healthcare-trends-and-data-show-clinical-shortage-tip-of-the-iceberg
- Worker skill estimation in team-based tasks. Proceedings of the VLDB Endowment 8, 11 (2015), 1142–1153.
- Lars Reinkemeyer. 2020. Process mining in action. Process Mining in Action Principles, Use Cases and Outloook (2020).
- A Case for Business Process-Specific Foundation Models. In International Conference on Business Process Management. Springer, 44–56.
- Tara Safavi and Danai Koutra. 2021. Relational world knowledge representation in contextual language models: A review. arXiv preprint arXiv:2104.05837 (2021).
- Invoice processing using robotic process automation. Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol 6, 2 (2020), 216–223.
- Henriika Sarilo-Kankaanranta and Lauri Frank. 2021. The Slow Adoption Rate of Software Robotics in Accounting and Payroll Services and the Role of Resistance to Change in Innovation-Decision Process. In Conference of the Italian Chapter of AIS. Springer, 201–216.
- Business process cockpit. In VLDB’02: Proceedings of the 28th International Conference on Very Large Databases. Elsevier, 880–883.
- Fred Schulte and Erika Fry. 2019. Death by 1,000 clicks: Where electronic health records went wrong. Kaiser Health News 18 (2019).
- From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces. arXiv preprint arXiv:2306.00245 (2023).
- HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face. arXiv:2303.17580 [cs.CL]
- Reflexion: Language Agents with Verbal Reinforcement Learning.(2023). arXiv preprint cs.AI/2303.11366 (2023).
- Just ask for calibration: Strategies for eliciting calibrated confidence scores from language models fine-tuned with human feedback. arXiv preprint arXiv:2305.14975 (2023).
- UIPath. 2022. UiPath Certified RPA Associate v1.0 - EXAM Description.pdf. https://start.uipath.com/rs/995-XLT-886/images/UiPath%20Certified%20RPA%20Associate%20v1.0%20-%20EXAM%20Description.pdf
- Wil MP Van der Aalst. 2014. Process mining in the large: a tutorial. Business Intelligence: Third European Summer School, eBISS 2013, Dagstuhl Castle, Germany, July 7-12, 2013, Tutorial Lectures 3 (2014), 33–76.
- Large Language Models for Business Process Management: Opportunities and Challenges. arXiv preprint arXiv:2304.04309 (2023).
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291 (2023).
- Cogvlm: Visual expert for pretrained language models. arXiv preprint arXiv:2311.03079 (2023).
- Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models. arXiv preprint arXiv:2311.05997 (2023).
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
- Judith Wewerka and Manfred Reichert. 2020. Robotic Process Automation–A Systematic Literature Review and Assessment Framework. arXiv preprint arXiv:2012.11951 (2020).
- WebUI: A Dataset for Enhancing Visual UI Understanding with Web Semantics. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–14.
- Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155 (2023).
- OS-Copilot: Towards Generalist Computer Agents with Self-Improvement. arXiv preprint arXiv:2402.07456 (2024).
- Gpt-4v in wonderland: Large multimodal models for zero-shot smartphone gui navigation. arXiv preprint arXiv:2311.07562 (2023).
- Set-of-mark prompting unleashes extraordinary visual grounding in gpt-4v. arXiv preprint arXiv:2310.11441 (2023).
- AppAgent: Multimodal Agents as Smartphone Users. arXiv preprint arXiv:2312.13771 (2023).
- React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629 (2022).
- ProAgent: From Robotic Process Automation to Agentic Process Automation. arXiv preprint arXiv:2311.10751 (2023).
- Agflow: Agent-based cross-enterprise workflow management system. In VLDB. 697–698.
- UFO: A UI-Focused Agent for Windows OS Interaction. arXiv preprint arXiv:2402.07939 (2024).
- Vision-Language Models for Vision Tasks: A Survey. arXiv:2304.00685 [cs.CV]
- GPT-4V(ision) is a Generalist Web Agent, if Grounded. arXiv:2401.01614 [cs.IR]
- Webarena: A realistic web environment for building autonomous agents. arXiv preprint arXiv:2307.13854 (2023).