UniICL: An Efficient Unified Framework Unifying Compression, Selection, and Generation (2405.17062v3)
Abstract: In-context learning (ICL) enhances the reasoning abilities of LLMs by prepending a few demonstrations. It motivates researchers to introduce more examples to provide additional contextual information for the generation. However, existing methods show a significant limitation due to the problem of excessive growth in context length, which causes a large hardware burden. In addition, shallow-relevant examples selected by off-the-shelf tools hinder LLMs from capturing useful contextual information for generation. In this paper, we propose \textbf{UniICL}, a novel \textbf{Uni}fied \textbf{ICL} framework that unifies demonstration compression, demonstration selection, and final response generation. Furthermore, to boost inference efficiency, we design a tailored compression strategy that allows UniICL to cache compression results into \textbf{Demonstration Bank} (\textbf{DB}), which avoids repeated compression of the same demonstration. Extensive out-of-domain evaluations prove the advantages of UniICL in both effectiveness and efficiency.
- Token merging: Your vit but faster. arXiv preprint arXiv:2210.09461.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Adapting language models to compress contexts. arXiv preprint arXiv:2305.14788.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- In-context autoencoder for context compression in a large language model. arXiv preprint arXiv:2307.06945.
- Cicero: A dataset for contextualized commonsense inference in dialogues. arXiv preprint arXiv:2203.13926.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738.
- Llmlingua: Compressing prompts for accelerated inference of large language models. arXiv preprint arXiv:2310.05736.
- Learned token pruning for transformers. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 784–794.
- Yucheng Li. 2023. Unlocking context constraints of llms: Enhancing context efficiency of llms with self-information-based content filtering. arXiv preprint arXiv:2304.12102.
- Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
- What makes good in-context examples for gpt-3333? arXiv preprint arXiv:2101.06804.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786.
- Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, pages 142–150.
- Noisy channel language model prompting for few-shot text classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5316–5330.
- Learning to compress prompts with gist tokens. arXiv preprint arXiv:2304.08467.
- Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. ArXiv, abs/1808.08745.
- Ms marco: A human generated machine reading comprehension dataset. choice, 2640:660.
- In-context retrieval-augmented language models. arXiv preprint arXiv:2302.00083.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
- Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1631–1642.
- Stanford alpaca: An instruction-following llama model.
- BlueLM Team. 2023. Bluelm: An open multilingual 7b language model. https://github.com/vivo-ai-lab/BlueLM.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Attention is all you need. Advances in neural information processing systems, 30.
- Is chatgpt a good nlg evaluator? a preliminary study. arXiv preprint arXiv:2303.04048.
- Label words are anchors: An information flow perspective for understanding in-context learning. arXiv preprint arXiv:2305.14160.
- Simlm: Pre-training with representation bottleneck for dense passage retrieval. arXiv preprint arXiv:2207.02578.
- Large search model: Redefining search stack in the era of llms. In ACM SIGIR Forum, volume 57, pages 1–16. ACM New York, NY, USA.
- Is chatgpt a good sentiment analyzer? a preliminary study. arXiv preprint arXiv:2304.04339.
- Neural network acceptability judgments. arXiv preprint 1805.12471.
- Zero-shot information extraction via chatting with chatgpt. arXiv preprint arXiv:2302.10205.
- Prompt compression and contrastive conditioning for controllability and toxicity reduction in language models. arXiv preprint arXiv:2210.03162.
- An explanation of in-context learning as implicit bayesian inference. arXiv preprint arXiv:2111.02080.
- Exploring the limits of chatgpt for query or aspect-based text summarization. arXiv preprint arXiv:2302.08081.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
- Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv preprint arXiv:2306.05685.