Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 147 tok/s
Gemini 2.5 Pro 40 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 58 tok/s Pro
Kimi K2 201 tok/s Pro
GPT OSS 120B 434 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Instructing Large Language Models to Identify and Ignore Irrelevant Conditions (2403.12744v1)

Published 19 Mar 2024 in cs.CL

Abstract: Math word problem (MWP) solving requires generating a reasoning path based on a given problem description that often contains irrelevant conditions. Existing chain-of-thought (CoT) prompting methods elicited multi-step reasoning abilities of LLMs to solve MWPs. However, they were seriously confused by the irrelevant conditions, resulting in low accuracy. In this paper, we propose a novel approach named I$3$C that instructs LLMs to identify and ignore irrelevant conditions. It identifies a set of irrelevant condition candidates that have a weak semantic relevance with the question. Then it prompts LLMs to verify the irrelevant conditions. Lastly it instructs the LLMs with the verification on relevant and irrelevant conditions to avoid confusion and improve reasoning paths. Moreover, we propose to select (problem, reasoning paths) pairs as demonstrations to enhance I$3$C with few-shot reasoning. We develop I$3$C-Select that selects the most confusing problems based on the semantic relevance measurement. We conduct extensive experiments on eight MWP datasets. I$3$C can be combined with any CoT prompting methods to improve the performance of solving MWPs. Notably, with GPT-3.5-Turbo and I$3$C-Select, we achieve an accuracy of 96.0 and 94.1 on GSM-IC2-1K and GSM-ICM-1K, respectively, significantly outperforming the state-of-the-art few-shot prompting method Complex-CoT by +11.7 and +11.1. Our implementation is made publicly available at https://wzy6642.github.io/I3C.github.io/.

Citations (4)

Summary

  • The paper introduces I3C, a method instructing LLMs to detect and ignore irrelevant conditions, leading to more accurate reasoning in math word problems.
  • It describes a three-step process of candidate identification, verification, and instruction integration, validated across eight MWP datasets.
  • I3C-Select optimizes demonstration selection by choosing high-confusion problems, reducing computational costs while maintaining high accuracy.

Instructing LLMs to Ignore Irrelevant Conditions

The paper "Instructing LLMs to Identify and Ignore Irrelevant Conditions" (2403.12744) introduces I3^3C, a novel approach designed to enhance LLM performance in solving MWPs. The key innovation involves instructing LLMs to explicitly identify and ignore irrelevant conditions, which often confuse existing CoT prompting methods. The paper demonstrates that by incorporating I3^3C, LLMs can generate more accurate reasoning paths and achieve state-of-the-art results across a range of MWP datasets.

I3^3C Methodology

The I3^3C approach comprises three main steps: identifying irrelevant condition candidates, verifying their irrelevance, and leveraging these verifications to guide the LLM's reasoning process (Figure 1). Figure 1

Figure 1

Figure 1: Existing CoT prompting methods were confused by irrelevant conditions in math word problems and gave wrong answers.

Initially, the method splits a MWP into individual conditions {ci}\{c_i\} and a question sentence qq. A pre-trained LLM, such as SimCSE, encodes these conditions and the question into vector representations, {ci}\{\mathbf{c}_{i}\} and q\mathbf{q}, respectively. The semantic relevance between each condition cic_i and the question qq is then quantified using cosine similarity, si(c)s_{i}^{\text{(c)}} and si(q)s_{i}^{\text{(q)}}.

Conditions with low semantic relevance (i.e., si(c)<θs_{i}^{\text{(c)}} < \theta or si(q)<θs_{i}^{\text{(q)}} < \theta) are flagged as irrelevant condition candidates, forming the set I={ck(irr)}\mathcal{I}=\{c_{k}^{(\mathrm{irr})}\}. The threshold θ\theta is a hyperparameter that controls the sensitivity of the irrelevance detection.

Next, an LLM is prompted to verify whether each candidate condition ck(irr)c_{k}^{(\mathrm{irr})} is indeed irrelevant. The verification prompt takes the form: “QQ. Is condition $c_{k^{(\mathrm{irr})}$ relevant to the process of solving problem qq?" The LLM's response, vk(irr)v_{k}^{(\mathrm{irr})}, provides a justification for the relevance or irrelevance of the condition.

Finally, the verification outputs {vk(irr)}\{v_{k}^{(\mathrm{irr})}\} are combined to create the I3^3C instruction, denoted by II. This instruction is then prepended to any CoT prompting method, guiding the LLM to focus on relevant information and ignore irrelevant details.

Enhancements with I3^3C-Select

To further enhance the performance of I3^3C, the authors introduce I3^3C-Select, a few-shot prompting method that automatically selects the most confusing problems as demonstrations. The confusion score of a problem QQ is defined as the inverse of the average similarity between its conditions and the question:

conf(Q)=[1ni=1ncos(ci,q)]1\text{conf}(Q) = \left[\frac{1}{n}\sum_{i=1}^n\cos{(\mathbf{c}_i, \mathbf{q})}\right]^{-1}

The KK problems with the highest confusion scores are selected, and their reasoning paths are generated using the Zero-Shot-CoT prompting method. These confusing problems and their reasoning paths serve as demonstrations for the LLM, enabling it to better handle complex scenarios with irrelevant conditions.

Experimental Evaluation and Results

The effectiveness of I3^3C and I3^3C-Select was evaluated on eight MWP datasets, including AddSub, SVAMP, GSM8K, SingleEq, GSM-IC2-1K, GSM-ICM-1K, AQuA, and MATH. The experiments demonstrate that adding the I3^3C instruction to CoT prompting methods significantly improves their performance. For example, adding I3^3C instruction to Manual-CoT improves the accuracy by +8.1+8.1 on AddSub, +8.1+8.1 on SVAMP, +6.0+6.0 on GSM8K, +5.1+5.1 on SingleEq, +5.1+5.1 on GSM-IC2-1K, +2.8+2.8 on AQuA, +9.2+9.2 on MATH, and +7.8+7.8 on GSM-ICM-1K. The most striking results were observed on datasets with a high proportion of irrelevant conditions, such as GSM-IC2-1K and GSM-ICM-1K. On these datasets, I3^3C-Select achieved accuracy gains of +11.7+11.7 and +11.1+11.1, respectively, compared to the Complex-CoT method. Figure 2

Figure 2: Performance comparison of Complex-CoT, Complex-CoT with I3^3C instruction (i.e., Complex-CoT+I3C), and Complex-CoT with self-consistency (i.e., Complex-CoT-Self-Consistency). We can observe that the accuracy of Complex-CoT+I3C and Complex-CoT-Self-Consistency is nearly identical, while Complex-CoT+I3C consumes much less tokens and time than Complex-CoT-Self-Consistency.

The authors also compared the performance of Complex-CoT with I3^3C (Complex-CoT+I3^3C) against Complex-CoT with self-consistency (Complex-CoT-Self-Consistency). The results showed that Complex-CoT+I3^3C achieved nearly identical accuracy to Complex-CoT-Self-Consistency, while consuming significantly fewer tokens and time (Figure 2). This highlights the efficiency and effectiveness of the I3^3C approach. Figure 3

Figure 3: Demonstration construction methods comparison. Low'' indicates selecting eight problems with the lowest confusion scores.Medium'' indicates randomly selecting eight problems. ``High'' indicates selecting eight problems with the highest confusion scores.

Ablation studies were conducted to evaluate the impact of different demonstration construction methods on the performance of I3^3C-Select. The results demonstrated that selecting the most confusing problems as demonstrations ("High") consistently outperformed selecting problems with the lowest confusion scores ("Low") or randomly selecting problems ("Medium") (Figure 3). This finding supports the hypothesis that focusing on the most challenging examples can effectively improve the LLM's ability to handle irrelevant conditions. Figure 4

Figure 4: Hyperparameter analysis. (a) As the threshold increases, the recall scores of identified irrelevant condition candidates first increase and then remain unchanged for all datasets except SingleEq. (b) As the threshold increases, the percentage of conditions to be verified first increases and then remains unchanged for all datasets.

Hyperparameter analysis was performed to determine the optimal threshold θ\theta for identifying irrelevant condition candidates. The results indicated that a threshold of $0.5$ provided a good balance between the recall of irrelevant conditions and the percentage of conditions requiring verification (Figure 4).

Implications and Future Directions

The I3^3C approach has significant implications for the development of more robust and reliable LLMs. By explicitly addressing the issue of irrelevant conditions, I3^3C enables LLMs to generate more accurate reasoning paths and improve their performance on complex problem-solving tasks. The plug-and-play nature of the I3^3C module makes it easy to integrate into existing CoT prompting methods, providing a versatile tool for enhancing LLM capabilities.

Future research directions could explore the application of I3^3C to other NLP tasks that are susceptible to irrelevant information, such as question answering and text summarization. Additionally, investigating the use of more sophisticated methods for identifying irrelevant conditions, such as employing more advanced semantic similarity measures or training dedicated irrelevance detection models, could further improve the performance of I3^3C.

Conclusion

The paper "Instructing LLMs to Identify and Ignore Irrelevant Conditions" (2403.12744) presents a valuable contribution to the field of LLMs. The I3^3C approach offers a practical and effective solution for mitigating the negative impact of irrelevant conditions on MWP solving performance. The experimental results demonstrate the superiority of I3^3C and I3^3C-Select over existing prompting methods, highlighting the potential of explicit instruction for enhancing LLM reasoning abilities.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.