Diverse, but Divisive: LLMs Can Exaggerate Gender Differences in Opinion Related to Harms of Misinformation (2401.16558v1)
Abstract: The pervasive spread of misinformation and disinformation poses a significant threat to society. Professional fact-checkers play a key role in addressing this threat, but the vast scale of the problem forces them to prioritize their limited resources. This prioritization may consider a range of factors, such as varying risks of harm posed to specific groups of people. In this work, we investigate potential implications of using a LLM to facilitate such prioritization. Because fact-checking impacts a wide range of diverse segments of society, it is important that diverse views are represented in the claim prioritization process. This paper examines whether a LLM can reflect the views of various groups when assessing the harms of misinformation, focusing on gender as a primary variable. We pose two central questions: (1) To what extent do prompts with explicit gender references reflect gender differences in opinion in the United States on topics of social relevance? and (2) To what extent do gender-neutral prompts align with gendered viewpoints on those topics? To analyze these questions, we present the TopicMisinfo dataset, containing 160 fact-checked claims from diverse topics, supplemented by nearly 1600 human annotations with subjective perceptions and annotator demographics. Analyzing responses to gender-specific and neutral prompts, we find that GPT 3.5-Turbo reflects empirically observed gender differences in opinion but amplifies the extent of these differences. These findings illuminate AI's complex role in moderating online communication, with implications for fact-checkers, algorithm designers, and the use of crowd-workers as annotators. We also release the TopicMisinfo dataset to support continuing research in the community.
- 2021. Human-AI teaming: State-of-the-art and research needs. Technical Report. National Academies of Sciences, Engineering, and Medicine, Washington, DC. https://nap.nationalacademies.org/resource/26355/AFRL.pdf
- 2021. Univision Joins International Fact-Checking Network (IFCN). https://www.univision.com/noticias/univision-joins-international-fact-checking-network-ifcn. Accessed: 2024-01-22.
- Grace Abels. 2023. Can ChatGPT fact-check? PolitiFact tested. PolitiFact (30 May 2023). https://www.politifact.com/article/2023/may/30/can-chatgpt-fact-check-politifact-tested/
- The illusion of artificial inclusion. arXiv preprint arXiv:2401.08572 (2024).
- Using large language models to simulate multiple humans and replicate human subject studies. In International Conference on Machine Learning. PMLR, 337–371.
- Fighting the COVID-19 infodemic in social media: a holistic perspective and a call to arms. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 15. 913–922.
- American National Election Studies. 2021. ANES 2020 Time Series Study Full Release. https://electionstudies.org/data-center/2020-time-series-study Accessed December 21, 2023.
- Out of one, many: Using language models to simulate human samples. Political Analysis 31, 3 (2023), 337–351.
- The state of human-centered NLP technology for fact-checking. Information processing & management 60, 2 (2023), 103219. Davani et al. (2022) Aida Mostafazadeh Davani, Mark Díaz, and Vinodkumar Prabhakaran. 2022. Dealing with disagreements: Looking beyond the majority vote in subjective annotations. Transactions of the Association for Computational Linguistics 10 (2022), 92–110. Del Vicario et al. (2016) Michela Del Vicario, Alessandro Bessi, Fabiana Zollo, Fabio Petroni, Antonio Scala, Guido Caldarelli, H Eugene Stanley, and Walter Quattrociocchi. 2016. The spreading of misinformation online. Proceedings of the national academy of Sciences 113, 3 (2016), 554–559. Demartini et al. (2020) Gianluca Demartini, Stefano Mizzaro, and Damiano Spina. 2020. Human-in-the-loop Artificial Intelligence for Fighting Online Misinformation: Challenges and Opportunities. IEEE Data Eng. Bull. 43, 3 (2020), 65–74. Eliassi-Rad et al. (2020) Tina Eliassi-Rad, Henry Farrell, David Garcia, Stephan Lewandowsky, Patricia Palacios, Don Ross, Didier Sornette, Karim Thébault, and Karoline Wiesner. 2020. What science can do for democracy: a complexity science approach. Humanities and Social Sciences Communications 7, 1 (2020), 1–4. Guo et al. (2022) Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics 10 (2022), 178–206. Hall and Wilson (1991) Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Aida Mostafazadeh Davani, Mark Díaz, and Vinodkumar Prabhakaran. 2022. Dealing with disagreements: Looking beyond the majority vote in subjective annotations. Transactions of the Association for Computational Linguistics 10 (2022), 92–110. Del Vicario et al. (2016) Michela Del Vicario, Alessandro Bessi, Fabiana Zollo, Fabio Petroni, Antonio Scala, Guido Caldarelli, H Eugene Stanley, and Walter Quattrociocchi. 2016. The spreading of misinformation online. Proceedings of the national academy of Sciences 113, 3 (2016), 554–559. Demartini et al. (2020) Gianluca Demartini, Stefano Mizzaro, and Damiano Spina. 2020. Human-in-the-loop Artificial Intelligence for Fighting Online Misinformation: Challenges and Opportunities. IEEE Data Eng. Bull. 43, 3 (2020), 65–74. Eliassi-Rad et al. (2020) Tina Eliassi-Rad, Henry Farrell, David Garcia, Stephan Lewandowsky, Patricia Palacios, Don Ross, Didier Sornette, Karim Thébault, and Karoline Wiesner. 2020. What science can do for democracy: a complexity science approach. Humanities and Social Sciences Communications 7, 1 (2020), 1–4. Guo et al. (2022) Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics 10 (2022), 178–206. Hall and Wilson (1991) Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Michela Del Vicario, Alessandro Bessi, Fabiana Zollo, Fabio Petroni, Antonio Scala, Guido Caldarelli, H Eugene Stanley, and Walter Quattrociocchi. 2016. The spreading of misinformation online. Proceedings of the national academy of Sciences 113, 3 (2016), 554–559. Demartini et al. (2020) Gianluca Demartini, Stefano Mizzaro, and Damiano Spina. 2020. Human-in-the-loop Artificial Intelligence for Fighting Online Misinformation: Challenges and Opportunities. IEEE Data Eng. Bull. 43, 3 (2020), 65–74. Eliassi-Rad et al. (2020) Tina Eliassi-Rad, Henry Farrell, David Garcia, Stephan Lewandowsky, Patricia Palacios, Don Ross, Didier Sornette, Karim Thébault, and Karoline Wiesner. 2020. What science can do for democracy: a complexity science approach. Humanities and Social Sciences Communications 7, 1 (2020), 1–4. Guo et al. (2022) Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics 10 (2022), 178–206. Hall and Wilson (1991) Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Gianluca Demartini, Stefano Mizzaro, and Damiano Spina. 2020. Human-in-the-loop Artificial Intelligence for Fighting Online Misinformation: Challenges and Opportunities. IEEE Data Eng. Bull. 43, 3 (2020), 65–74. Eliassi-Rad et al. (2020) Tina Eliassi-Rad, Henry Farrell, David Garcia, Stephan Lewandowsky, Patricia Palacios, Don Ross, Didier Sornette, Karim Thébault, and Karoline Wiesner. 2020. What science can do for democracy: a complexity science approach. Humanities and Social Sciences Communications 7, 1 (2020), 1–4. Guo et al. (2022) Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics 10 (2022), 178–206. Hall and Wilson (1991) Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Tina Eliassi-Rad, Henry Farrell, David Garcia, Stephan Lewandowsky, Patricia Palacios, Don Ross, Didier Sornette, Karim Thébault, and Karoline Wiesner. 2020. What science can do for democracy: a complexity science approach. Humanities and Social Sciences Communications 7, 1 (2020), 1–4. Guo et al. (2022) Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics 10 (2022), 178–206. Hall and Wilson (1991) Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics 10 (2022), 178–206. Hall and Wilson (1991) Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Dealing with disagreements: Looking beyond the majority vote in subjective annotations. Transactions of the Association for Computational Linguistics 10 (2022), 92–110. Del Vicario et al. (2016) Michela Del Vicario, Alessandro Bessi, Fabiana Zollo, Fabio Petroni, Antonio Scala, Guido Caldarelli, H Eugene Stanley, and Walter Quattrociocchi. 2016. The spreading of misinformation online. Proceedings of the national academy of Sciences 113, 3 (2016), 554–559. Demartini et al. (2020) Gianluca Demartini, Stefano Mizzaro, and Damiano Spina. 2020. Human-in-the-loop Artificial Intelligence for Fighting Online Misinformation: Challenges and Opportunities. IEEE Data Eng. Bull. 43, 3 (2020), 65–74. Eliassi-Rad et al. (2020) Tina Eliassi-Rad, Henry Farrell, David Garcia, Stephan Lewandowsky, Patricia Palacios, Don Ross, Didier Sornette, Karim Thébault, and Karoline Wiesner. 2020. What science can do for democracy: a complexity science approach. Humanities and Social Sciences Communications 7, 1 (2020), 1–4. Guo et al. (2022) Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics 10 (2022), 178–206. Hall and Wilson (1991) Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Michela Del Vicario, Alessandro Bessi, Fabiana Zollo, Fabio Petroni, Antonio Scala, Guido Caldarelli, H Eugene Stanley, and Walter Quattrociocchi. 2016. The spreading of misinformation online. Proceedings of the national academy of Sciences 113, 3 (2016), 554–559. Demartini et al. (2020) Gianluca Demartini, Stefano Mizzaro, and Damiano Spina. 2020. Human-in-the-loop Artificial Intelligence for Fighting Online Misinformation: Challenges and Opportunities. IEEE Data Eng. Bull. 43, 3 (2020), 65–74. Eliassi-Rad et al. (2020) Tina Eliassi-Rad, Henry Farrell, David Garcia, Stephan Lewandowsky, Patricia Palacios, Don Ross, Didier Sornette, Karim Thébault, and Karoline Wiesner. 2020. What science can do for democracy: a complexity science approach. Humanities and Social Sciences Communications 7, 1 (2020), 1–4. Guo et al. (2022) Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics 10 (2022), 178–206. Hall and Wilson (1991) Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Gianluca Demartini, Stefano Mizzaro, and Damiano Spina. 2020. Human-in-the-loop Artificial Intelligence for Fighting Online Misinformation: Challenges and Opportunities. IEEE Data Eng. Bull. 43, 3 (2020), 65–74. Eliassi-Rad et al. (2020) Tina Eliassi-Rad, Henry Farrell, David Garcia, Stephan Lewandowsky, Patricia Palacios, Don Ross, Didier Sornette, Karim Thébault, and Karoline Wiesner. 2020. What science can do for democracy: a complexity science approach. Humanities and Social Sciences Communications 7, 1 (2020), 1–4. Guo et al. (2022) Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics 10 (2022), 178–206. Hall and Wilson (1991) Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Tina Eliassi-Rad, Henry Farrell, David Garcia, Stephan Lewandowsky, Patricia Palacios, Don Ross, Didier Sornette, Karim Thébault, and Karoline Wiesner. 2020. What science can do for democracy: a complexity science approach. Humanities and Social Sciences Communications 7, 1 (2020), 1–4. Guo et al. (2022) Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics 10 (2022), 178–206. Hall and Wilson (1991) Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics 10 (2022), 178–206. Hall and Wilson (1991) Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- The spreading of misinformation online. Proceedings of the national academy of Sciences 113, 3 (2016), 554–559. Demartini et al. (2020) Gianluca Demartini, Stefano Mizzaro, and Damiano Spina. 2020. Human-in-the-loop Artificial Intelligence for Fighting Online Misinformation: Challenges and Opportunities. IEEE Data Eng. Bull. 43, 3 (2020), 65–74. Eliassi-Rad et al. (2020) Tina Eliassi-Rad, Henry Farrell, David Garcia, Stephan Lewandowsky, Patricia Palacios, Don Ross, Didier Sornette, Karim Thébault, and Karoline Wiesner. 2020. What science can do for democracy: a complexity science approach. Humanities and Social Sciences Communications 7, 1 (2020), 1–4. Guo et al. (2022) Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics 10 (2022), 178–206. Hall and Wilson (1991) Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Gianluca Demartini, Stefano Mizzaro, and Damiano Spina. 2020. Human-in-the-loop Artificial Intelligence for Fighting Online Misinformation: Challenges and Opportunities. IEEE Data Eng. Bull. 43, 3 (2020), 65–74. Eliassi-Rad et al. (2020) Tina Eliassi-Rad, Henry Farrell, David Garcia, Stephan Lewandowsky, Patricia Palacios, Don Ross, Didier Sornette, Karim Thébault, and Karoline Wiesner. 2020. What science can do for democracy: a complexity science approach. Humanities and Social Sciences Communications 7, 1 (2020), 1–4. Guo et al. (2022) Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics 10 (2022), 178–206. Hall and Wilson (1991) Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Tina Eliassi-Rad, Henry Farrell, David Garcia, Stephan Lewandowsky, Patricia Palacios, Don Ross, Didier Sornette, Karim Thébault, and Karoline Wiesner. 2020. What science can do for democracy: a complexity science approach. Humanities and Social Sciences Communications 7, 1 (2020), 1–4. Guo et al. (2022) Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics 10 (2022), 178–206. Hall and Wilson (1991) Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics 10 (2022), 178–206. Hall and Wilson (1991) Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Human-in-the-loop Artificial Intelligence for Fighting Online Misinformation: Challenges and Opportunities. IEEE Data Eng. Bull. 43, 3 (2020), 65–74. Eliassi-Rad et al. (2020) Tina Eliassi-Rad, Henry Farrell, David Garcia, Stephan Lewandowsky, Patricia Palacios, Don Ross, Didier Sornette, Karim Thébault, and Karoline Wiesner. 2020. What science can do for democracy: a complexity science approach. Humanities and Social Sciences Communications 7, 1 (2020), 1–4. Guo et al. (2022) Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics 10 (2022), 178–206. Hall and Wilson (1991) Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Tina Eliassi-Rad, Henry Farrell, David Garcia, Stephan Lewandowsky, Patricia Palacios, Don Ross, Didier Sornette, Karim Thébault, and Karoline Wiesner. 2020. What science can do for democracy: a complexity science approach. Humanities and Social Sciences Communications 7, 1 (2020), 1–4. Guo et al. (2022) Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics 10 (2022), 178–206. Hall and Wilson (1991) Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics 10 (2022), 178–206. Hall and Wilson (1991) Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- What science can do for democracy: a complexity science approach. Humanities and Social Sciences Communications 7, 1 (2020), 1–4. Guo et al. (2022) Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics 10 (2022), 178–206. Hall and Wilson (1991) Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics 10 (2022), 178–206. Hall and Wilson (1991) Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- A survey on automated fact-checking. Transactions of the Association for Computational Linguistics 10 (2022), 178–206. Hall and Wilson (1991) Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Peter Hall and Susan R Wilson. 1991. Two guidelines for bootstrap hypothesis testing. Biometrics (1991), 757–762. Hassan et al. (2017) Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Naeemul Hassan, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1803–1812. Herian et al. (2012) Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Mitchel N Herian, Joseph A Hamm, Alan J Tomkins, and Lisa M Pytlik Zillig. 2012. Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Public participation, procedural fairness, and evaluations of local governance: The moderating role of uncertainty. Journal of Public Administration Research and Theory 22, 4 (2012), 815–840. Horton (2023) John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- John J Horton. 2023. Large language models as simulated economic agents: What can we learn from homo silicus? Technical Report. National Bureau of Economic Research. Horvitz (1999) Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Huddy et al. (2008) Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Leonie Huddy, Erin Cassese, and Mary-Kate Lizotte. 2008. Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Gender, public opinion, and political reasoning. Political women and American democracy (2008), 31–49. Intemann (2010) Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778–796. Jaradat et al. (2018) Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- ClaimRank: Detecting check-worthy claims in Arabic and English. arXiv preprint arXiv:1804.07587 (2018). Katharopoulos et al. (2020) Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165. Korinek (2023) Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Anton Korinek. 2023. Language models and cognitive automation for economic research. Technical Report. National Bureau of Economic Research. Liu et al. (2023) Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2023. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023). Lizotte (2020) Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Mary-Kate Lizotte. 2020. Gender differences in public opinion: Values and political consequences. Temple University Press. MacKinnon (2009) James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- James G MacKinnon. 2009. Bootstrap hypothesis testing. Handbook of computational econometrics (2009), 183–213. Meta (2020) Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Meta. 2020. Here’s how we’re using AI to help detect misinformation. https://ai.meta.com/blog/heres-how-were-using-ai-to-help-detect-misinformation/. Accessed on 2024-01-09. Moorish (2023) Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Lydia Moorish. 2023. Fact-Checkers Are Using AI Like ChatGPT to Battle Misinformation. Wired (1 Feb 2023). https://www.wired.com/story/fact-checkers-ai-chatgpt-misinformation/ Nadeem et al. (2019) Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, and James Glass. 2019. FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- FAKTA: An Automatic End-to-End Fact Checking System. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 78–83. https://doi.org/10.18653/v1/N19-4014 Nakov et al. (2022a) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, et al. 2022a. Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Overview of the clef–2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 495–520. Nakov et al. (2022b) Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, Alberto Barrón-Cedeño, Giovanni Da San Martino, Firoj Alam, Julia Maria Struß, Thomas Mandl, Rubén Míguez, Tommaso Caselli, Mucahid Kutlu, Wajdi Zaghouani, Chengkai Li, Shaden Shaar, Gautam Kishore Shahi, Hamdy Mubarak, Alex Nikolov, Nikolay Babulkov, Yavuz Selim Kartal, and Javier Beltrán. 2022b. The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- The CLEF-2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection. In Advances in Information Retrieval, Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer International Publishing, Cham, 416–428. Nakov et al. (2021) Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Automated fact-checking for assisting human fact-checkers. arXiv preprint arXiv:2103.07769 (2021). Neumann et al. (2022) Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour. 2022. Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Justice in misinformation detection systems: An analysis of algorithms, stakeholders, and potential harms. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1504–1515. Neumann and Wolczynski (2023) Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Terrence Neumann and Nicholas Wolczynski. 2023. Does AI-Assisted Fact-Checking Disproportionately Benefit Majority Groups Online?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 480–490. https://doi.org/10.1145/3593013.3594013 Perez et al. (2022) Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. 2022. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022). Samarinas et al. (2021) Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Chris Samarinas, Wynne Hsu, and Mong Li Lee. 2021. Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Improving evidence retrieval for automated explainable fact-checking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 84–91. Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023). Sehat et al. (2023) Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, and Amy X Zhang. 2023. Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Misinformation as a harm: structured approaches for fact-checking prioritization. arXiv preprint arXiv:2312.11678 (2023). Spones (2023) Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Spones. 2023. Transparency: Topic Selection. Online. https://www.snopes.com/transparency/. See WayBack Machine: https://web.archive.org/web/20221203184817/https://www.snopes.com/transparency/.. Sun et al. (2023) Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2023. Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Aligning with whom? large language models have gender and racial biases in subjective nlp tasks. arXiv preprint arXiv:2311.09730 (2023). Thakur and Hankerson (2021) Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Dhanaraj Thakur and DeVan L. Hankerson. 2021. Facts and their Discontents: A Research Agenda for Online Disinformation, Race, and Gender. file:///Users/tdn897/Downloads/disinfo-race-gender.pdf License: Creative Commons Attribution 4.0 International. Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023). Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483 (2023). White et al. (2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023). Zeng et al. (2021) Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Xia Zeng, Amani S Abumansour, and Arkaitz Zubiaga. 2021. Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Automated fact-checking: A survey. Language and Linguistics Compass 15, 10 (2021), e12438. Zhang (2015) Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Weiyu Zhang. 2015. Perceived procedural fairness in deliberation: Predictors and effects. Communication Research 42, 3 (2015), 345–364. Zhu et al. (2023) Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023). Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).
- Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145 (2023).