Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 40 tok/s Pro
GPT-5 High 38 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 200 tok/s Pro
GPT OSS 120B 438 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Towards Optimizing the Costs of LLM Usage (2402.01742v1)

Published 29 Jan 2024 in cs.CL, cs.AI, and cs.LG

Abstract: Generative AI and LLMs in particular are heavily used nowadays for various document processing tasks such as question answering and summarization. However, different LLMs come with different capabilities for different tasks as well as with different costs, tokenization, and latency. In fact, enterprises are already incurring huge costs of operating or using LLMs for their respective use cases. In this work, we propose optimizing the usage costs of LLMs by estimating their output quality (without actually invoking the LLMs), and then solving an optimization routine for the LLM selection to either keep costs under a budget, or minimize the costs, in a quality and latency aware manner. We propose a model to predict the output quality of LLMs on document processing tasks like summarization, followed by an LP rounding algorithm to optimize the selection of LLMs. We study optimization problems trading off the quality and costs, both theoretically and empirically. We further propose a sentence simplification model for reducing the number of tokens in a controlled manner. Additionally, we propose several deterministic heuristics for reducing tokens in a quality aware manner, and study the related optimization problem of applying the heuristics optimizing the quality and cost trade-off. We perform extensive empirical validation of our methods on not only enterprise datasets but also on open-source datasets, annotated by us, and show that we perform much better compared to closest baselines. Our methods reduce costs by 40%- 90% while improving quality by 4%-7%. We will release the annotated open source datasets to the community for further research and exploration.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. GPT-3 API Latency — Model Comparison. https://medium.com/@evyborov/gpt-3-api-latency-model-comparison-13888a834938.
  2. gptcache. https://github.com/zilliztech/GPTCache.
  3. gptrim. https://www.gptrim.com/.
  4. NLTK. https://www.nltk.org/.
  5. OpenAI. https://openai.com/.
  6. OpenAI Pricing. https://openai.com/pricing.
  7. pyspellchecker. https://pypi.org/project/pyspellchecker/.
  8. thesaurus. https://github.com/zaibacu/thesaurus.
  9. Tiktoken. https://github.com/openai/tiktoken.
  10. Ashoori, M. Decoding the true cost of generative ai for your enterprise. https://www.linkedin.com/pulse/decoding-true-cost-generative-ai-your-enterprise-maryam-ashoori-phd/, 2023. [Online; accessed Oct-12-2023].
  11. Ms marco: A human generated machine reading comprehension dataset, 2016.
  12. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization (Ann Arbor, Michigan, June 2005), Association for Computational Linguistics, pp. 65–72.
  13. Frugalml: How to use ml prediction apis more accurately and cheaply, 2020.
  14. Efficient online ml api selection for multi-label classification tasks, 2021.
  15. Frugalgpt: How to use large language models while reducing cost and improving performance. arXiv preprint arXiv:2305.05176 (2023).
  16. The economic potential of generative ai: The next productivity frontier. https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier#introduction, 2023. [Online; accessed Oct-12-2023].
  17. Efficient unsupervised sentence compression by fine-tuning transformers with reinforcement learning, 2022.
  18. Samsum corpus: A human-annotated dialogue dataset for abstractive summarization.
  19. Cosmos qa: Machine reading comprehension with contextual commonsense reasoning, 2019.
  20. Babybear: Cheap inference triage for expensive language models, 2022.
  21. Neural text generation from structured data with application to the biography domain, 2016.
  22. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady (1966), vol. 10, Soviet Union, pp. 707–710.
  23. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension, 2019.
  24. Lin, C.-Y. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out (Barcelona, Spain, July 2004), Association for Computational Linguistics, pp. 74–81.
  25. Natural language inference in context – investigating contextual reasoning over long texts, 2020.
  26. Logiqa: A challenge dataset for machine reading comprehension with logical reasoning, 2020.
  27. Tangobert: Reducing inference cost by using cascaded architecture, 2022.
  28. Muss: Multilingual unsupervised sentence simplification by mining paraphrases, 2020.
  29. Controllable sentence simplification, 2020.
  30. fairseq: A fast, extensible toolkit for sequence modeling, 2019.
  31. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (Philadelphia, Pennsylvania, USA, July 2002), Association for Computational Linguistics, pp. 311–318.
  32. Ranodeb Banerjee, O. Automatic document processing with large language models. https://www.linkedin.com/pulse/automatic-document-processing-large-language-models-ranodeb-banerjee/?utm_source=rss&utm_campaign=articles_sitemaps&utm_medium=google_news, 2023. [Online; accessed Oct-12-2023].
  33. Sallam, R. The economic potential of generative ai: The next productivity frontier. https://www.gartner.com/en/articles/take-this-view-to-assess-roi-for-generative-ai, 2023. [Online; accessed Oct-12-2023].
  34. Shafaq Naz, E. C. Reinventing logistics: Harnessing generative ai and gpt for intelligent document processing. https://www.e2enetworks.com/blog/reinventing-logistics-harnessing-generative-ai-and-gpt-for-intelligent-document-processing, 2023. [Online; accessed Oct-12-2023].
  35. Bigpatent: A large-scale dataset for abstractive and coherent summarization, 2019.
  36. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
  37. XtractEdge. Cutting through the noise – how generative ai will change the idp landscape. https://www.edgeverve.com/xtractedge/blogs/transforming-idp-with-generative/, 2023. [Online; accessed Oct-12-2023].
  38. Reclor: A reading comprehension dataset requiring logical reasoning, 2020.
  39. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675 (2019).
  40. Sentence simplification with deep reinforcement learning, 2017.
Citations (10)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 tweets and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper: