Taking the Next Step with Generative Artificial Intelligence: The Transformative Role of Multimodal Large Language Models in Science Education (2401.00832v3)
Abstract: The integration of AI, particularly LLM-based systems, in education has shown promise in enhancing teaching and learning experiences. However, the advent of Multimodal LLMs (MLLMs) like GPT-4 with vision (GPT-4V), capable of processing multimodal data including text, sound, and visual inputs, opens a new era of enriched, personalized, and interactive learning landscapes in education. Grounded in theory of multimedia learning, this paper explores the transformative role of MLLMs in central aspects of science education by presenting exemplary innovative learning scenarios. Possible applications for MLLMs could range from content creation to tailored support for learning, fostering competencies in scientific practices, and providing assessment and feedback. These scenarios are not limited to text-based and uni-modal formats but can be multimodal, increasing thus personalization, accessibility, and potential learning effectiveness. Besides many opportunities, challenges such as data protection and ethical considerations become more salient, calling for robust frameworks to ensure responsible integration. This paper underscores the necessity for a balanced approach in implementing MLLMs, where the technology complements rather than supplants the educator's role, ensuring thus an effective and ethical use of AI in science education. It calls for further research to explore the nuanced implications of MLLMs on the evolving role of educators and to extend the discourse beyond science education to other disciplines. Through the exploration of potentials, challenges, and future implications, we aim to contribute to a preliminary understanding of the transformative trajectory of MLLMs in science education and beyond.
- Gpt-3-driven pedagogical agents to train children’s curious question-asking skills. International Journal of Artificial Intelligence in Education, pages 1–36.
- Gpt-3-driven pedagogical agents to train children’s curious question-asking skills. International Journal of Artificial Intelligence in Education.
- Flamingo: a visual language model for few-shot learning.
- Palm 2 technical report.
- Large language models and the perils of their hallucinations. Critical Care, 27(1):1–2.
- Authentic learning exercises as a means to influence preservice teachers’ technology integration self-efficacy and intentions to integrate technology. Australasian Journal of Educational Technology, 30(6).
- Process mining techniques for analysing patterns and strategies in students’ self-regulated learning. Metacognition and Learning, 9(2):161–185.
- Open science saves lives: lessons from the covid-19 pandemic. BMC Medical Research Methodology, 21(1):1–18.
- Assessing student errors in experimentation using artificial intelligence and large language models: A comparative study with human raters. Computers and Education: Artificial Intelligence, 5:100177.
- Myths, mis- and preconceptions of artificial intelligence: A review of the literature. Computers and Education: Artificial Intelligence, 4:100143.
- On the opportunities and risks of foundation models.
- Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems.
- Language models are realistic tabular data generators. arXiv preprint arXiv:2210.06280.
- Language models are few-shot learners.
- Five trends of education and technology in a sustainable future. Geography and Sustainability, 1(2):93–97.
- Bybee, R. W. (1997). Achieving scientific literacy. Heinemann, Portsmouth, NH. Includes bibliographical references (p. 233-254) and index.
- Elucidating stem concepts through generative ai: A multi-modal exploration of analogical reasoning. arXiv preprint arXiv:2308.10454.
- X-llm: Bootstrapping advanced large language models by treating multi-modalities as foreign languages.
- Medically aware gpt-3 as a data generator for medical dialogue summarization. In Proceedings of the Second Workshop on Natural Language Processing for Medical Conversations. Association for Computational Linguistics.
- Palm: Scaling language modeling with pathways.
- DeepMind, G. (2023). Welcome to the Gemini era.
- Department for Education (2013). The national curriculum in England: key stages 3 and 4 framework document (2014). Department for Education.
- The teaching of science: New insights into knowledge, language and pedagogy. In Teaching Science, pages 1–19. Routledge.
- Artificial intelligence in education: Fears and faiths. International Journal of Information and Education Technology, 12(7):650–657.
- Chatgpt for (finance) research: The bananarama conjecture. Finance Research Letters, 53:103662.
- Palm-e: An embodied multimodal language model.
- Generating medical reports from patient-doctor conversations using sequence-to-sequence models. In Proceedings of the First Workshop on Natural Language Processing for Medical Conversations. Association for Computational Linguistics.
- European Union (2023). EU AI Act: first regulation on artificial intelligence. https://www.europarl.europa.eu/news/en/headlines/society/20230601STO93804/eu-ai-act-first-regulation-on-artificial-intelligence.
- Flick, L. B. (1993). The meanings of hands-on science. Journal of Science Teacher Education, 4(1):1–8.
- Mme: A comprehensive evaluation benchmark for multimodal large language models. arXiv preprint arXiv:2306.13394.
- Gabel, D. (1998). The Complexity of Chemistry and Implications for Teaching, pages 233–248. Springer Netherlands.
- Gardner, M. (2012). Scientific language. LEARN Journal: Language Education and Acquisition Research Network, 2:13–32.
- Imagine & immerse yourself: Does visuospatial imagery moderate learning in virtual reality? Computers & Education, page 104909.
- Scaffolding for creative product possibilities in a design-based stem activity. Research in science education, 45:727–748.
- Hattie, J. (2008). Visible Learning. Routledge.
- The power of feedback. Review of Educational Research, 77(1):81–112.
- Henderson, G. (1999). Learning with diagrams. Australian Science Teachers Journal, 45(2):17.
- Conditions that enable effective feedback. Higher Education Research & Development, 38(7):1401–1416.
- Developing students’ ability to ask more and better questions resulting from inquiry-type chemistry laboratories. Journal of Research in Science Teaching, 42(7):791–806.
- What do university students know about artificial intelligence? development and validation of an ai literacy test. Computers and Education: Artificial Intelligence, 5:100165.
- Multimedia effect in problem solving: A meta-analysis. Educational Psychology Review, 33(4):1717–1747.
- Audiogpt: Understanding and generating speech, music, sound, and talking head.
- Mathprompter: Mathematical reasoning using large language models.
- Mathprompter: Mathematical reasoning using large language models. arXiv preprint arXiv:2303.05398.
- Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38.
- The effects of pre-training types on cognitive load, collaborative knowledge construction and deep learning in a computer-supported collaborative learning environment. Interactive Learning Environments, 29(7):1163–1175.
- Challenges and applications of large language models.
- Kalyuga, S. (2007). Expertise reversal effect and its implications for learner-tailored instruction. Educational Psychology Review, 19(4):509–539.
- Chatgpt for good? on opportunities and challenges of large language models for education. Learning and Individual Differences, 103:102274.
- Framework for 21st century learning. Partnership for 21st centuryskills,(11.10. 2015) Retrieved from http://www. p21.org/our-work/p21-framework.
- Kechel, J.-H. (2016). Schülerschwierigkeiten beim eigenständigen experimentieren. Description based on publisher supplied metadata and other sources.
- Why minimal guidance during instruction does not work: An analysis of the failure of constructivist, discovery, problem-based, experiential, and inquiry-based teaching. Educational Psychologist, 41(2):75–86.
- KMK (2004). Beschlüsse der Kultusministerkonferenz: Bildungsstandards im Fach Biologie für den Mittleren Schulabschluss. München.
- Physics task development of prospective physics teachers using chatgpt. Physical Review Physics Education Research, 19(2):020128.
- Lan, W. (1998). self-monitoring skills in statistics. In D. Schunk, & B. Zimmerman (Eds.), Developing Self-Regulated Learners: From Teaching to Self-Reflective Practice. Guilford.
- Fine-tuning chatgpt for automatic scoring. arXiv preprint arXiv:2310.10072.
- Ai gender bias, disparities, and fairness: Does training data matter? arXiv preprint arXiv:2312.10833.
- Multimodality of ai for education: Towards artificial general intelligence. arXiv preprint arXiv:2312.06037.
- Multimodality of ai for education: Towards artificial general intelligence.
- Nerif: Gpt-4v for automatic scoring of drawn models. arXiv preprint arXiv:2311.12990.
- Rlaif: Scaling reinforcement learning from human feedback with ai feedback. arXiv preprint arXiv:2309.00267.
- Lemke, J. L. (1998). Teaching all the languages of science: Words, symbols, images, and actions.
- Self-regulation behaviors in underprepared (developmental) and regular admission college students. Contemporary Educational Psychology, 23(1):42–64.
- Instructional principles for self-regulation. Educational Technology Research and Development, 49(2):93–103.
- Multimodal foundation models: From specialists to general-purpose assistants. arXiv preprint arXiv:2309.10020, 1(2):2.
- Videochat: Chat-centric video understanding.
- Visual instruction tuning.
- What is ai literacy? competencies and design considerations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, CHI ’20. ACM.
- Video-chatgpt: Towards detailed video understanding via large vision and language models.
- When learning the hard way makes learning easy: Building better lab note-taking skills. Journal of Chemical Education, 87(7):703–704.
- Generating diverse code explanations using the gpt-3 large language model. In Proceedings of the 2022 ACM Conference on International Computing Education Research - Volume 2, ICER 2022. ACM.
- Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896.
- Mayer, R. E. (1997). Multimedia learning: Are we asking the right questions? Educational Psychologist, 32(1):1–19.
- Mayer, R. E. (2021). Multimedia learning. Title from publisher’s bibliographic system (viewed on 29 Jun 2020).
- Nine ways to reduce cognitive load in multimedia learning. Educational Psychologist, 38(1):43–52.
- McComas, W. F. (2014). “21st-Century Skills”, pages 1–1. SensePublishers.
- Molenaar, I. (2022). Towards hybrid human‐ai learning technologies. European Journal of Education, 57(4):632–645.
- Assessing the quality of student-generated short answer questions using gpt-3. In European conference on technology enhanced learning, pages 243–257. Springer.
- Foundations of the learning sciences. In The Cambridge handbook of the learning sciences, pages 21–43.
- Promoting pre‐experimental activities in high‐school chemistry: Focusing on the role of students’ epistemic questions. International Journal of Science Education, 30(13):1801–1821.
- Nielsen, K. H. (2012). Scientific communication and the nature of science. Science & Education, 22(9):2067–2086.
- Do student perceptions of teaching predict the development of representational competence and biological knowledge? Learning and Instruction, 31:13–22.
- NRC (2012). Framework for K-12 Science Education Practices, Crosscutting Concepts, and Core Ideas. National Academies Press.
- OECD (2018). The Future of Education and Skills: Education 2023.
- OpenAI (2023). Gpt-4 technical report.
- OpenAI (2023). Gpt-4v(ision) system card. Technical report, OpenAI.
- Training language models to follow instructions with human feedback.
- Paivio, A. (1991). Dual coding theory: Retrospect and current status. Canadian Journal of Psychology / Revue canadienne de psychologie, 45(3):255–287.
- Pavlik, J. V. (2023). Collaborating with chatgpt: Considering the implications of generative artificial intelligence for journalism and media education. Journalism & Mass Communication Educator, 78(1):84–93.
- The refinedweb dataset for falcon llm: Outperforming curated corpora with web data, and web data only.
- Studying the expertise reversal of the multimedia signaling effect at a process level: evidence from eye tracking. Instructional Science, 47(6):627–658.
- Signaling text-picture relations in multimedia learning: A comprehensive meta-analysis. Educational Research Review, 17:19–36.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695.
- Mathematical discoveries from program search with large language models. Nature, pages 1–3.
- Schiff, D. (2020). Out of the laboratory and into the classroom: the future of artificial intelligence in education. AI & SOCIETY, 36(1):331–348.
- What’s next for ai ethics, policy, and governance? a global overview. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pages 153–158.
- Construction and interference in learning from multiple representation. Learning and Instruction, 13(2):141–156.
- Exploring first year university students’ statistical literacy: A case on describing and visualizing data. Journal on Mathematics Education, 12(3):427–448.
- PEER: Empowering Writing with Large Language Models, pages 755–761. Springer Nature Switzerland.
- Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face.
- Pandagpt: One model to instruction-follow them all.
- Why minimally guided teaching techniques do not work: A reply to commentaries. Educational Psychologist, 42(2):115–121.
- Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
- Llama: Open and efficient foundation language models.
- Llama 2: Open foundation and fine-tuned chat models.
- Treagust, D. F. (2008). The Role of Multiple Representations in Learning Science: Enhancing Students’ Conceptual Understanding and Motivation, pages 7–23. BRILL.
- Effects of feedback in a computer-based learning environment on students’ learning outcomes: A meta-analysis. Review of Educational Research, 85(4):475–511.
- van Gog, T. (2014). The Signaling (or Cueing) Principle in Multimedia Learning, pages 263–278. Cambridge University Press.
- Reading images. Deakin University.
- Nationality bias in text generation. arXiv preprint arXiv:2302.02463.
- Common mistakes in the construction of diagrams in biological contexts. Research in Science Education, 45(2):193–213.
- Language and literacy in science education.
- Using automated analysis to assess middle school students’ competence with scientific argumentation. Journal of Research in Science Teaching, pages 1–32.
- Discovery of a structural class of antibiotics with explainable deep learning. Nature, pages 1–9.
- Visual chatgpt: Talking, drawing and editing with visual foundation models.
- Next-gpt: Any-to-any multimodal llm.
- Towards Improving the Reliability and Transparency of ChatGPT for Educational Question Answering, pages 475–488. Springer Nature Switzerland.
- A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, 32(1):4–24.
- mplug-owl: Modularization empowers large language models with multimodality.
- Yeh, S. S. (2010). Understanding and addressing the achievement gap through individualized instruction and formative assessment. Assessment in Education: Principles, Policy & Practice, 17(2):169–182.
- Wordcraft: Story writing with large language models. In 27th International Conference on Intelligent User Interfaces, IUI ’22. ACM.
- Zhai, X. (2023). Chatgpt for next generation science learning. XRDS: Crossroads, The ACM Magazine for Students, 29(3):42–46.
- Pseudo AI Bias. Oxford University Press, UK.
- Ai and formative assessment: The train has left the station. Journal of Research in Science Teaching, 60(6):1390–1398.
- Speechgpt: Empowering large language models with intrinsic cross-modal conversational abilities.
- Siren’s song in the ai ocean: A survey on hallucination in large language models. arXiv preprint arXiv:2309.01219.
- Multimodal chain-of-thought reasoning in language models.
- Judging llm-as-a-judge with mt-bench and chatbot arena.
- Minigpt-4: Enhancing vision-language understanding with advanced large language models.
- Arne Bewersdorff (5 papers)
- Christian Hartmann (2 papers)
- Marie Hornberger (1 paper)
- Kathrin Seßler (7 papers)
- Maria Bannert (3 papers)
- Enkelejda Kasneci (97 papers)
- Gjergji Kasneci (69 papers)
- Xiaoming Zhai (48 papers)
- Claudia Nerdel (3 papers)