Emergent Mind

Using ChatGPT for Thematic Analysis

Published May 13, 2024 in cs.HC


The utilisation of AI-driven tools, notably ChatGPT, within academic research is increasingly debated from several perspectives including ease of implementation, and potential enhancements in research efficiency, as against ethical concerns and risks such as biases and unexplained AI operations. This paper explores the use of the GPT model for initial coding in qualitative thematic analysis using a sample of UN policy documents. The primary aim of this study is to contribute to the methodological discussion regarding the integration of AI tools, offering a practical guide to validation for using GPT as a collaborative research assistant. The paper outlines the advantages and limitations of this methodology and suggests strategies to mitigate risks. Emphasising the importance of transparency and reliability in employing GPT within research methodologies, this paper argues for a balanced use of AI in supported thematic analysis, highlighting its potential to elevate research efficacy and outcomes.


  • The paper discusses the integration of ChatGPT into thematic analysis, a method that identifies and reports patterns in textual data, highlighting the differences between codes and themes.

  • It presents a pilot test where a GPT model coded 63 UN policy documents, generating over 700 distinct codes, revealing insights that might be missed by manual methods.

  • The paper compares GPT-driven thematic analysis with traditional topic modeling, underscoring GPT's ability to capture nuanced details and discusses challenges such as accuracy, methodological rigor, and ethical concerns.

Using GPT for Thematic Analysis: Practical Insights and Challenges

What is Thematic Analysis?

Thematic Analysis is a popular qualitative research method that aids researchers in identifying, analyzing, and reporting patterns or themes within textual data. This robust method provides valuable insights into the semantic essence of data, linking different segments to specific ideas or concepts. Codes and themes occupy different semantic planes – while a code captures a single topic, a theme goes further in encompassing dimensions or meanings across multiple codes.

Manual Versus GPT-Driven Thematic Analysis

Traditionally, thematic analysis involves labor-intensive tasks: immersing in the data, coding it, clustering, and extracting themes which are then synthesized into coherent reports. However, ChatGPT has introduced a novel approach by automating initial coding, thus offering an efficient alternative. This AI-driven technique is designed to augment, not replace, human researchers, allowing them to focus on theme development and deeper interpretation.

ChatGPT excels at generating human-like text, simplifying the coding process and potentially revealing insights that manual methods might overlook. Nevertheless, its effectiveness significantly depends on the quality of prompts and careful instruction, which is where techniques like Few-Shot Learning and Chain-of-Thought Approaches come into play.

Pilot Test: Coding UN Policy Documents

The paper provides practical insights through a pilot test of a custom GPT model designed for initial coding of UN policy documents. Here’s a glimpse of the approach:

  1. Familiarization with Data: Researchers immerse themselves or the GPT model in the text to identify significant details.
  2. Initial Coding: GPT model generates initial codes by analyzing the text and linking specific quotations to corresponding codes.
  3. Clustering: Grouping codes into broader themes, assisting the researcher in theme development.

In this pilot, the GPT model analyzed 63 UN policy documents and generated over 700 distinct codes. These codes highlighted various aspects of AI, from ethical considerations to global governance, reflecting a multifaceted discussion about AI's implications.

Comparing GPT-Driven Analysis with Topic Modeling

The paper also compares GPT-driven thematic analysis with traditional topic modeling (LDA - Latent Dirichlet Allocation). Topic modeling provides a broad overview by identifying thematic clusters based on word distribution across the text corpus. This comparison revealed that while LDA offers a general thematic landscape, GPT-driven analysis dives deeper, capturing nuanced details.

Despite LDA's reliability, its themes don’t always equate to semantic codes found in thematic analysis. This distinction underscores that while LDA can validate broad themes, GPT-driven coding enriches the analysis by addressing specific details and nuances.

Challenges and Limitations

Though promising, the GPT model's integration in thematic analysis isn’t without challenges. The paper notes:

  • Descriptive Over Interpretive Outputs: GPT often provides descriptive rather than deeply interpretive responses, lacking the inferential reasoning human researchers apply.
  • Accuracy Issues: Occasional errors in quotations or code naming highlight the need for manual review to ensure accuracy.
  • Methodological Rigor: Incorporating grounded frameworks and controls is essential to maintaining validity and reliability.
  • Ethical Concerns: Issues related to data privacy, potential informational cocoons, and AI misinterpretations must be addressed.

OpenAI Policy Changes: Implications

OpenAI's recent policy changes restricting citation generation for privacy reasons have introduced significant challenges. The inability to generate direct quotations impacts research methodology, requiring researchers to manually verify AI-paraphrased outputs against original texts to maintain analytical depth.

Conclusion and Future Directions

Integrating GPT into thematic analysis offers a blend of improved efficiency and deeper insights. However, it mandates a balanced approach with manual oversight. Future research should focus on providing comprehensive guidance on using GPT in qualitative research, leveraging its strengths while mitigating limitations. While current methodologies evolve, AI tools promise to transform qualitative research, enabling faster, more detailed analyses and supporting broader, more comprehensive understanding in various fields.

As AI continues to develop, its integration into research methodologies will likely expand, fostering innovative analytical approaches and potentially groundbreaking discoveries. This balanced, hybrid method, combining AI efficiency with human oversight, may well define the future of qualitative research.

Create an account to read this summary for free:


Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.