Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 39 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 12 tok/s Pro
GPT-5 High 18 tok/s Pro
GPT-4o 91 tok/s Pro
Kimi K2 191 tok/s Pro
GPT OSS 120B 456 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

UniICL: An Efficient Unified Framework Unifying Compression, Selection, and Generation (2405.17062v3)

Published 27 May 2024 in cs.CL

Abstract: In-context learning (ICL) enhances the reasoning abilities of LLMs by prepending a few demonstrations. It motivates researchers to introduce more examples to provide additional contextual information for the generation. However, existing methods show a significant limitation due to the problem of excessive growth in context length, which causes a large hardware burden. In addition, shallow-relevant examples selected by off-the-shelf tools hinder LLMs from capturing useful contextual information for generation. In this paper, we propose \textbf{UniICL}, a novel \textbf{Uni}fied \textbf{ICL} framework that unifies demonstration compression, demonstration selection, and final response generation. Furthermore, to boost inference efficiency, we design a tailored compression strategy that allows UniICL to cache compression results into \textbf{Demonstration Bank} (\textbf{DB}), which avoids repeated compression of the same demonstration. Extensive out-of-domain evaluations prove the advantages of UniICL in both effectiveness and efficiency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Token merging: Your vit but faster. arXiv preprint arXiv:2210.09461.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  3. Adapting language models to compress contexts. arXiv preprint arXiv:2305.14788.
  4. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  5. In-context autoencoder for context compression in a large language model. arXiv preprint arXiv:2307.06945.
  6. Cicero: A dataset for contextualized commonsense inference in dialogues. arXiv preprint arXiv:2203.13926.
  7. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738.
  8. Llmlingua: Compressing prompts for accelerated inference of large language models. arXiv preprint arXiv:2310.05736.
  9. Learned token pruning for transformers. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 784–794.
  10. Yucheng Li. 2023. Unlocking context constraints of llms: Enhancing context efficiency of llms with self-information-based content filtering. arXiv preprint arXiv:2304.12102.
  11. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
  12. What makes good in-context examples for gpt-3333? arXiv preprint arXiv:2101.06804.
  13. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  14. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786.
  15. Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, pages 142–150.
  16. Noisy channel language model prompting for few-shot text classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5316–5330.
  17. Learning to compress prompts with gist tokens. arXiv preprint arXiv:2304.08467.
  18. Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. ArXiv, abs/1808.08745.
  19. Ms marco: A human generated machine reading comprehension dataset. choice, 2640:660.
  20. In-context retrieval-augmented language models. arXiv preprint arXiv:2302.00083.
  21. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
  22. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1631–1642.
  23. Stanford alpaca: An instruction-following llama model.
  24. BlueLM Team. 2023. Bluelm: An open multilingual 7b language model. https://github.com/vivo-ai-lab/BlueLM.
  25. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  26. Attention is all you need. Advances in neural information processing systems, 30.
  27. Is chatgpt a good nlg evaluator? a preliminary study. arXiv preprint arXiv:2303.04048.
  28. Label words are anchors: An information flow perspective for understanding in-context learning. arXiv preprint arXiv:2305.14160.
  29. Simlm: Pre-training with representation bottleneck for dense passage retrieval. arXiv preprint arXiv:2207.02578.
  30. Large search model: Redefining search stack in the era of llms. In ACM SIGIR Forum, volume 57, pages 1–16. ACM New York, NY, USA.
  31. Is chatgpt a good sentiment analyzer? a preliminary study. arXiv preprint arXiv:2304.04339.
  32. Neural network acceptability judgments. arXiv preprint 1805.12471.
  33. Zero-shot information extraction via chatting with chatgpt. arXiv preprint arXiv:2302.10205.
  34. Prompt compression and contrastive conditioning for controllability and toxicity reduction in language models. arXiv preprint arXiv:2210.03162.
  35. An explanation of in-context learning as implicit bayesian inference. arXiv preprint arXiv:2111.02080.
  36. Exploring the limits of chatgpt for query or aspect-based text summarization. arXiv preprint arXiv:2302.08081.
  37. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
  38. Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv preprint arXiv:2306.05685.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets