Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
43 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Large Language Model Augmented Exercise Retrieval for Personalized Language Learning (2402.16877v1)

Published 8 Feb 2024 in cs.IR, cs.AI, cs.CL, and cs.LG

Abstract: We study the problem of zero-shot exercise retrieval in the context of online language learning, to give learners the ability to explicitly request personalized exercises via natural language. Using real-world data collected from language learners, we observe that vector similarity approaches poorly capture the relationship between exercise content and the language that learners use to express what they want to learn. This semantic gap between queries and content dramatically reduces the effectiveness of general-purpose retrieval models pretrained on large scale information retrieval datasets like MS MARCO. We leverage the generative capabilities of LLMs to bridge the gap by synthesizing hypothetical exercises based on the learner's input, which are then used to search for relevant exercises. Our approach, which we call mHyER, overcomes three challenges: (1) lack of relevance labels for training, (2) unrestricted learner input content, and (3) low semantic similarity between input and retrieval candidates. mHyER outperforms several strong baselines on two novel benchmarks created from crowdsourced data and publicly available data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Ghodai Abdelrahman and Qing Wang. 2019. Knowledge Tracing with Sequential Key-Value Memory Networks. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM.
  2. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. arXiv preprint arXiv:1611.09268 (2016).
  3. Inpars: Unsupervised dataset generation for information retrieval. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2387–2392.
  4. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning (ICML). PMLR, 1597–1607.
  5. Improving Contrastive Learning of Sentence Embeddings from AI Feedback. arXiv preprint arXiv:2305.01918 (2023).
  6. DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4207–4218.
  7. Albert T Corbett and John R Anderson. 1994. Knowledge tracing: Modeling the acquisition of procedural knowledge. User modeling and user-adapted interaction 4 (1994), 253–278.
  8. Peng Cui and Mrinmaya Sachan. 2023. Adaptive and Personalized Exercise Generation for Online Language Learning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 10184–10198.
  9. Promptagator: Few-shot Dense Retrieval From 8 Examples. In The Eleventh International Conference on Learning Representations.
  10. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT. 4171–4186.
  11. Precise Zero-Shot Dense Retrieval without Relevance Labels. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 1762–1777.
  12. SimCSE: Simple Contrastive Learning of Sentence Embeddings. In 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021. 6894–6910.
  13. Investigating self-directed learning and technology readiness in blending learning environment. International Journal of Educational Technology in Higher Education 16, 1 (2019), 1–22.
  14. Dimensionality reduction by learning an invariant mapping. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 2. IEEE, 1735–1742.
  15. A Design of a Simple Yet Effective Exercise Recommendation System in K-12 Online Learning. In International Conference on Artificial Intelligence in Education. Springer, 208–212.
  16. Unsupervised Dense Information Retrieval with Contrastive Learning. Transactions on Machine Learning Research (2022).
  17. Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 6769–6781.
  18. Kyong-Jee Kim and Hye W Jang. 2015. Changes in medical students’ motivation and self-regulated learning: a preliminary study. International journal of medical education 6 (2015), 213.
  19. Stephen Krashen. 2005. Free voluntary reading: New research, applications, and controversies. Anthology series-Seameo regional language centre 46, 1 (2005).
  20. Latent Retrieval for Weakly Supervised Open Domain Question Answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 6086–6096.
  21. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692 (2019).
  22. Self-regulated learning and academic performance in medical education. Medical teacher 38, 6 (2016), 585–593.
  23. Berlinda Mandasari and Achmad Yudi Wahyudin. 2021. Flipped classroom learning model: implementation and its impact on EFL learners’ satisfaction on grammar class. Ethical Lingua: Journal of Language Teaching and Literature 8, 1 (2021), 150–158.
  24. OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
  25. Shalini Pandey and George Karypis. 2019. A Self-Attentive Model for Knowledge Tracing. International Educational Data Mining Society (2019).
  26. The effects of choice on intrinsic motivation and related outcomes: a meta-analysis of research findings. Psychological bulletin 134, 2 (2008), 270.
  27. Deep knowledge tracing. Advances in Neural Information Processing Systems 28 (2015).
  28. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval 3, 4 (2009), 333–389.
  29. Improving Passage Retrieval with Zero-Shot Question Generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 3781–3797.
  30. Saint+: Integrating temporal features for ednet correctness prediction. In LAK21: 11th International Learning Analytics and Knowledge Conference. 490–496.
  31. BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
  32. Structure-based knowledge tracing: An influence propagation view. In 2020 IEEE International Conference on Data Mining (ICDM). IEEE, 541–550.
  33. GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2345–2360.
  34. English Contrastive Learning Can Learn Universal Cross-lingual Sentence Embeddings. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 9122–9133.
  35. CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data. In Proceedings of the Twelfth Language Resources and Evaluation Conference. 4003–4012.
  36. ESimCSE: Enhanced Sample Building Method for Contrastive Learning of Unsupervised Sentence Embedding. In Proceedings of the 29th International Conference on Computational Linguistics. 3898–3907.
  37. Exercise recommendation based on knowledge concept prediction. Knowledge-Based Systems 210 (2020), 106481.
  38. Liangbei Xu and Mark A Davenport. 2020. Dynamic Knowledge Embedding and Tracing. International Educational Data Mining Society (2020).
  39. Generate rather than Retrieve: Large Language Models are Strong Context Generators. In The Eleventh International Conference on Learning Representations.
  40. COCO-DR: Combating the Distribution Shift in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust Learning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 1462–1479.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com