Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.
BIG-bench provides a comprehensive benchmark for language models with 204 diverse tasks across multiple domains, aiming to quantify and qualify model behaviors.
The benchmark evaluates models from Google and OpenAI, among others, using dense and sparse transformer architectures against a human expert baseline, focusing on performance, calibration, bias, and robustness.
Key findings include the correlation of performance improvement with model scale, sensitivity to task framing, the amplification of social biases in larger models, and underperformance in tasks involving low-resource languages.
The insights from BIG-bench inform future research directions in model calibration, bias mitigation, development of robust models, exploration into architectures, and inclusivity in data representation.
The capabilities of language models (LMs) evolve rapidly, continually setting new benchmarks that challenge our understanding of AI's potential. The introduction of the Beyond the Imitation Game (BIG-bench) benchmark seeks to address critical gaps in existing benchmarks for language models. BIG-bench stands out through its extensive inclusion of 204 diverse tasks spanning various domains such as linguistics, mathematics, commonsense reasoning, and even tasks like code debugging and chess move prediction. It aims to quantify model behaviors both qualitatively and quantitatively, offering a novel insight into the capabilities and limitations of modern language models across a broad spectrum of parameters.
The paper reports on evaluations conducted across models of varying complexities, including those from Google and OpenAI that range from millions to hundreds of billions of parameters. Notably, these evaluations include the use of dense transformers and sparse transformer architectures. The benchmark also incorporates a human expert baseline to provide context for the model's performance. In doing so, BIG-bench contributes significantly to the discourse on LM capabilities by not just focusing on task performance but also on the models' calibration, bias, and robustness to task presentation.
One of the primary observations from the benchmark is the considerable improvement in performance correlating with model scale. Despite this trend, it's essential to note that all models, irrespective of their size, demonstrated considerable deficiencies when compared to expert human performance. The analysis uncovers instances of "breakthrough" behavior, where model performance on specific tasks improves dramatically beyond a certain model scale. This phenomenon indicates a nonlinear scaling behavior in LMs, especially in tasks involving multi-step reasoning or those with narrow success metrics.
The benchmark elucidates the models' brittleness, highlighted by their performance fluctuation based on task framing. Such findings prompt a reevaluation of model robustness and the potential need for models that can generalize across various framings of essentially the same task.
A disconcerting finding is the amplification of social biases in models as they scale, especially in tasks set in broad or ambiguous contexts. This underscores the critical need for continued emphasis on ethical AI development practices, focusing on fairness and the mitigation of biases.
BIG-bench showcases a pronounced performance disparity in tasks across different languages, particularly highlighting the models' underperformance in tasks involving low-resource languages. This gap accentuates the importance of inclusivity in data representation for training models that are truly global.
The insights from BIG-bench provide a roadmap for future research in LMs, emphasizing the importance of model calibration, the mitigation of biases, and the development of more robust models. Additionally, the emergence of breakthrough behaviors and the sensitivity to task framing underscore the need for continued exploration into model architectures and training procedures. Moreover, the performance gap in tasks involving low-resource languages and specific domains points to the need for a more inclusive approach in data procurement and model training.
BIG-bench marks a significant advancement in the pursuit of understanding LLMs' capabilities and limitations. By encompassing a wide range of tasks and evaluating models of varying scales, it delivers comprehensive insights into the current state of LMs. The findings highlight the complexities of model scaling, sensitivity to task framing, and the societal implications of model biases. As LMs continue to evolve, benchmarks like BIG-bench will be pivotal in guiding the development of more capable, equitable, and robust AI systems.
Wikiquote, russian proverbs. https://ru.wikiquote.org/wiki/%D0%A0%D1%83%D1%81%D1%81%D0%BA%D0%B8%D0%B5_%D0%BF%D0%BE%D1%81%D0%BB%D0%BE%D0%B2%D0%B8%D1%86%D1%8B.
Scott Alexander. A very unlikely chess game, 2020. https://slatestarcodex.com/2020/01/06/a-very-unlikely-chess-game/.
A survey of machine learning for big code and naturalness. ACM Comput. Surv., 51(4), July 2018. doi: 10.1145/3212695. https://doi.org/10.1145/3212695.
Structural language models of code. In Hal Daumé III and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp. 245--256. PMLR, 13--18 July 2020. https://proceedings.mlr.press/v119/alon20a.html.
A survey on approaches to computational humor generation. In Proceedings of the 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pp. 29--41, Online, December 2020. International Committee on Computational Linguistics. https://aclanthology.org/2020.latechclfl-1.4.
MathQA: Towards interpretable math word problem solving with operation-based formalisms. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 2357--2367, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1245. https://aclanthology.org/N19-1245.
Philip W. Anderson. More is different. Science, 177(4047):393--396, 1972. doi: 10.1126/science.177.4047.393. https://www.science.org/doi/abs/10.1126/science.177.4047.393.
On the cross-lingual transferability of monolingual representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4623--4637, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.421. https://aclanthology.org/2020.acl-main.421.
Big BiRD: A large, fine-grained, bigram relatedness dataset for examining semantic composition. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 505--516, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1050. https://aclanthology.org/N19-1050.
Generating fact checking explanations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7352--7364, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.656. https://aclanthology.org/2020.acl-main.656.
Salvatore Attardo. Humor in language. In Oxford Research Encyclopedia of Linguistics. Oxford University Press, 2017. doi: 10.1093/acrefore/9780199384655.013.342. https://oxfordre.com/linguistics/view/10.1093/acrefore/9780199384655.001.0001/acrefore-9780199384655-e-342.
Celex2 ldc96l14, 1995. https://doi.org/10.35111/gs6s-gm48.
Big data’s disparate impact. California Law Review, 104(3):671--732, 2016. http://www.jstor.org/stable/24758720.
Teaching classification boundaries to humans. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 27, pp. 109--115, Menlo Park, CA, June 2013. Association for the Advancement of Artificial Intelligence. https://ojs.aaai.org/index.php/AAAI/article/view/8623.
Climbing towards NLU: On meaning, form, and understanding in the age of data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5185--5198, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.463. https://aclanthology.org/2020.acl-main.463.
On the ability and limitations of transformers to recognize formal languages. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 7096--7116, Online, November 2020a. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.576. https://aclanthology.org/2020.emnlp-main.576.
On the practical ability of recurrent neural networks to recognize hierarchical languages. In Proceedings of the 28th International Conference on Computational Linguistics, pp. 1481--1494, Barcelona, Spain (Online), December 2020b. International Committee on Computational Linguistics. doi: 10.18653/v1/2020.coling-main.129. https://aclanthology.org/2020.coling-main.129.
A clustering approach for nearly unsupervised recognition of nonliteral language. In 11th Conference of the European Chapter of the Association for Computational Linguistics, pp. 329--336, Trento, Italy, April 2006. Association for Computational Linguistics. https://aclanthology.org/E06-1042.
PIQA: reasoning about physical commonsense in natural language. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp. 7423--7439, Menlo Park, CA, 2020. Association for the Advancement of Artificial Intelligence. doi: 10.1609/aaai.v34i05.6239. https://ojs.aaai.org/index.php/AAAI/article/view/6239.
Predicting human metaphor paraphrase judgments with deep neural networks. In Proceedings of the Workshop on Figurative Language Processing, pp. 45--55, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/W18-0906. https://aclanthology.org/W18-0906.
Large dataset and language model fun-tuning for humor recognition. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4027--4032, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1394. https://aclanthology.org/P19-1394.
Language (technology) is power: A critical survey of ‘‘bias’’ in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5454--5476, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.485. https://aclanthology.org/2020.acl-main.485.
Nicholas Boillot. Vector forms as a foreign language, 24 June 2019. https://www.fluate.net/en/travaux/vectoglyph.
Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016. https://proceedings.neurips.cc/paper/2016/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.html.
D33{}{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT data-driven documents. IEEE Transactions on Visualization and Computer Graphics, 17(12):2301--2309, 2011. doi: 10.1109/TVCG.2011.185. https://ieeexplore.ieee.org/document/6064996.
What will it take to fix benchmarking in natural language understanding? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4843--4855, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.385. https://aclanthology.org/2021.naacl-main.385.
A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 632--642, Lisbon, Portugal, September 2015. Association for Computational Linguistics. doi: 10.18653/v1/D15-1075. https://aclanthology.org/D15-1075.
Rosetta stone linguistic problems. In Proceedings of the Fourth Workshop on Teaching NLP and CL, pp. 1--8, Sofia, Bulgaria, August 2013. Association for Computational Linguistics. https://aclanthology.org/W13-3401.
Gwern Branwen. GPT-3 creative fiction. Gwern.net, June 2020. https://www.gwern.net/GPT-3.
Finding microaggressions in the wild: A case for locating elusive phenomena in social media posts. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 1664--1674, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1176. https://aclanthology.org/D19-1176.
Ralf Brown. Non-linear mapping for improved identification of 1300+ languages. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 627--632, Doha, Qatar, October 2014. Association for Computational Linguistics. doi: 10.3115/v1/D14-1069. https://aclanthology.org/D14-1069.
Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 1877--1901. Curran Associates, Inc., 2020. https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.
Ravens attribute visual access to unseen competitors. Nature Communications, 7:article 10506, 2016. https://www.nature.com/articles/ncomms10506.
The WMT’18 morpheval test suites for English-Czech, English-German, English-Finnish and Turkish-English. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pp. 546--560, Belgium, Brussels, October 2018. Association for Computational Linguistics. doi: 10.18653/v1/W18-6433. https://aclanthology.org/W18-6433.
Inference making ability and its relation to comprehension failure. Reading and Writing, 11(5–6):489--503, 1999. doi: 10.1023/A:1008084120205. https://link.springer.com/article/10.1023/A:1008084120205.
Eliciting good teaching from humans for machine learners. Artificial Intelligence, 217:198--215, 2014. doi: https://doi.org/10.1016/j.artint.2014.08.005. https://www.sciencedirect.com/science/article/pii/S0004370214001143.
Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183--186, 2017. doi: 10.1126/science.aal4230. https://www.science.org/doi/abs/10.1126/science.aal4230.
Does the chimpanzee have a theory of mind? 30 years later. Trends in Cognitive Sciences, 12:187--192, 2008. doi: 10.1016/j.tics.2008.02.010. https://doi.org/10.1016/j.tics.2008.02.010.
Tracy Canfield. Machine translation of Klingon, 2010. http://klingonska.org/academic/canfield-2010-machine_translation_of_klingon.pdf.
Extracting training data from LLMs. In 30th USENIX Security Symposium (USENIX Security 21), pp. 2633--2650. USENIX Association, August 2021. https://www.usenix.org/conference/usenixsecurity21/presentation/carlini-extracting.
Nathanael Chambers. Labeling documents with timestamps: Learning from their time expressions. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 98--106, Jeju Island, Korea, July 2012. Association for Computational Linguistics. https://aclanthology.org/P12-1011.
Studying cultural differences in emoji usage across the East and the West. In Proceedings of the International AAAI Conference on Web and Social Media, volume 13, pp. 226--235, Menlo Park, CA, Jul. 2019. Association for the Advancement of Artificial Intelligence. https://ojs.aaai.org/index.php/ICWSM/article/view/3224.
Simplicity: A unifying principle in cognitive science? Trends in Cognitive Sciences, 7:19--22, 2003. doi: 10.1016/S1364-6613(02)00005-0. https://doi.org/10.1016/S1364-6613(02)00005-0.
Generative pretraining from pixels. In Hal Daumé III and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp. 1691--1703. PMLR, 13--18 July 2020. https://proceedings.mlr.press/v119/chen20s.html.
Humor recognition using deep learning. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp. 113--117, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-2018. https://aclanthology.org/N18-2018.
Ricson Chen. Transformers play chess, 2020. https://github.com/ricsonc/transformers-play-chess.
Execution-guided neural program synthesis. https://openreview.net/pdf?id=H1gfOiAqYm, 2019b.
On measuring gender bias in translation of gender-neutral pronouns. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pp. 173--181, Florence, Italy, August 2019. Association for Computational Linguistics. doi: 10.18653/v1/W19-3824. https://aclanthology.org/W19-3824.
François Chollet. Abstraction and reasoning challenge, 2020. https://www.kaggle.com/c/abstraction-and-reasoning-challenge.
The algebraic theory of context-free languages. In P. Braffort and D. Hirschberg (eds.), Computer Programming and Formal Systems, volume 26 of Studies in Logic and the Foundations of Mathematics, pp. 118--161. Elsevier, 1959. doi: https://doi.org/10.1016/S0049-237X(09)70104-1. https://www.sciencedirect.com/science/article/pii/S0049237X09701041.
Automated data transformation with inductive programming and dynamic background knowledge. In Ulf Brefeld, Elisa Fromont, Andreas Hotho, Arno Knobbe, Marloes Maathuis, and Céline Robardet (eds.), Machine Learning and Knowledge Discovery in Databases, pp. 735--751, Cham, 2020. Springer. doi: 10.1007/978-3-030-46133-144. https://doi.org/10.1007/978-3-030-46133-144.
Introduction to Logic. Taylor & Francis, 2018. https://books.google.co.il/books?id=38bADwAAQBAJ.
Kate Crawford. The trouble with bias. https://www.youtube.com/watch?v=fMym_BKWQzk, 2017. Keynote address, NIPS 2017, Long Beach CA. Dec. 5
Metagol system, 2016. https://github.com/metagol/metagol.
Meta-interpretive learning of data transformation programs. In Katsumi Inoue, Hayato Ohwada, and Akihiro Yamamoto (eds.), Inductive Logic Programming, pp. 46--59, Cham, 2016. Springer. doi: 10.1007/978-3-319-40566-74. https://doi.org/10.1007/978-3-319-40566-74.
Learning higher-order logic programs. Machine Learning, 109:1289--1322, 2020. doi: 10.1007/s10994-019-05862-7. https://doi.org/10.1007/s10994-019-05862-7.
Joe Cruse. Emoji usage in TV conversation. Twitter blog, 18 Nov 2015. https://blog.twitter.com/en_us/a/2015/emoji-usage-in-tv-conversation.
Jim Daley. White Chicago cops use force more often than Black officers. Scientific American, 11 February 2021. https://www.scientificamerican.com/article/white-chicago-cops-use-force-more-often-than-black-officers/.
Finding contradictions in text. In Proceedings of ACL-08: HLT, pp. 1039--1047, Columbus, Ohio, June 2008. Association for Computational Linguistics. https://aclanthology.org/P08-1118.
Did it happen? The pragmatic complexity of veridicality assessment. Computational Linguistics, 38(2):301--333, June 2012. doi: 10.1162/COLIa00097. https://aclanthology.org/J12-2003.
The CommitmentBank: Investigating projection in naturally occurring discourse. Proceedings of Sinn und Bedeutung, 23(2):107--124, July 2019. doi: 10.18148/sub/2019.v23i2.601. https://ojs.ub.uni-konstanz.de/sub/index.php/sub/article/view/601.
Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990. doi: 10.1002/(SICI)1097-4571(199009)41:6391::AID-ASI13.0.CO;2-9. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6391::AID-ASI13.0.CO;2-9.
When redundancy is useful: A Bayesian approach to “overinformative” referring expressions. Psychological Review, 127:591--621, 2020. doi: 10.1037/rev0000186. https://doi.org/10.1037/rev0000186.
Calibration of pre-trained transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 295--302, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.21. https://aclanthology.org/2020.emnlp-main.21.
On measuring and mitigating biased inferences of word embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp. 7659--7666, Menlo Park, CA, Apr. 2020. Association for the Advancement of Artificial Intelligence. doi: 10.1609/aaai.v34i05.6267. https://ojs.aaai.org/index.php/AAAI/article/view/6267.
Sequence-based prediction of protein--protein interaction sites with l1-logreg classifier. Journal of Theoretical Biology, 348:47--54, 2014. doi: 10.1016/j.jtbi.2014.01.028. https://pubmed.ncbi.nlm.nih.gov/24486250/.
DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 2368--2378, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1246. https://aclanthology.org/N19-1246.
RoFT: A tool for evaluating human detection of machine-generated text. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 189--196, Online, October 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-demos.25. https://aclanthology.org/2020.emnlp-demos.25.
To test machine comprehension, start by defining comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7839--7859, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.701. https://aclanthology.org/2020.acl-main.701.
FEQA: A question answering evaluation framework for faithfulness assessment in abstractive summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5055--5070, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.454. https://aclanthology.org/2020.acl-main.454.
Compositional morpheme embeddings with affixes as functions and stems as arguments. In Proceedings of the Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP, pp. 1--5, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi: 10.18653/v1/W18-2901. https://aclanthology.org/W18-2901.
Cryptonite: A cryptic crossword benchmark for extreme ambiguity in language. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 4186--4192, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.344. https://aclanthology.org/2021.emnlp-main.344.
Semantic relatedness of Wikipedia concepts -- benchmark data and a working solution. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, May 2018. European Language Resources Association. https://aclanthology.org/L18-1408.
emoji2vec: Learning emoji representations from their description. In Proceedings of The Fourth International Workshop on Natural Language Processing for Social Media, pp. 48--54, Austin, TX, USA, November 2016. Association for Computational Linguistics. doi: 10.18653/v1/W16-6208. https://aclanthology.org/W16-6208.
Learning to learn programs from examples: Going beyond program structure. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 1638--1645, 2017. doi: 10.24963/ijcai.2017/227. https://doi.org/10.24963/ijcai.2017/227.
Question answering as an automatic evaluation metric for news article summarization. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 3938--3948, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1395. https://aclanthology.org/N19-1395.
Hierarchical neural story generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 889--898, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi: 10.18653/v1/P18-1082. https://aclanthology.org/P18-1082.
Beyond English-centric multilingual machine translation. Journal of Machine Learning Research, 22(107):1--48, 2021. http://jmlr.org/papers/v22/20-1307.html.
Humor detection via an internal and external neural network. Neurocomputing, 394:105--111, 2020. doi: https://doi.org/10.1016/j.neucom.2020.02.030. https://www.sciencedirect.com/science/article/pii/S0925231220302058.
Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2017. doi: 10.18653/v1/d17-1169. https://doi.org/10.18653/v1/D17-1169.
Susan T. Fiske. Controlling other people: The impact of power on stereotyping. American Psychologist, 48:621--628, 1993. doi: 10.1037/0003-066X.48.6.621. https://doi.org/10.1037/0003-066X.48.6.621.
The Cattell-Horn-Carroll theory of cognitive abilities. In Encyclopedia of Special Education. John Wiley & Sons, Ltd, 2014. doi: https://doi.org/10.1002/9781118660584.ese0431. https://onlinelibrary.wiley.com/doi/abs/10.1002/9781118660584.ese0431.
An introduction to inductive programming. Artificial Intelligence Review, 29:45--62, 2008. doi: 10.1007/s10462-009-9108-7. https://doi.org/10.1007/s10462-009-9108-7.
Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1):3--71, 1988. doi: https://doi.org/10.1016/0010-0277(88)90031-5. https://www.sciencedirect.com/science/article/pii/0010027788900315.
Whodunnit? Crime drama as a case for natural language understanding. Transactions of the Association for Computational Linguistics, 6:1--15, 2018. doi: 10.1162/tacla00001. https://aclanthology.org/Q18-1001.
Martins Frolovs. Teaching GPT-2 transformer a sense of humor: How to fine-tune large transformer models on a single GPU in PyTorch. Towards Data Science, Medium, 2019. https://towardsdatascience.com/teaching-gpt-2-a-sense-of-humor-fine-tuning-large-transformer-models-on-a-single-gpu-in-pytorch-59e8cec40912.
Neural metaphor detection in context. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 607--613, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-1060. https://aclanthology.org/D18-1060.
EleutherAI/lm-evaluation-harness: v0.2.0, March 2022. https://doi.org/10.5281/zenodo.6332975.
The TUNA-REG challenge 2009: Overview and evaluation results. In Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009), pp. 174--182, Athens, Greece, March 2009. Association for Computational Linguistics. https://aclanthology.org/W09-0629.
SyntaxGym: An online platform for targeted evaluation of language models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 70--76, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-demos.10. https://aclanthology.org/2020.acl-demos.10.
RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 3356--3369, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.301. https://aclanthology.org/2020.findings-emnlp.301.
The GEM benchmark: Natural language generation, its evaluation and metrics. In Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021), pp. 96--120, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.gem-1.10. https://aclanthology.org/2021.gem-1.10.
Conversational implicatures in English dialogue: Annotated dataset. Procedia Computer Science, 171:2316--2323, 2020. doi: https://doi.org/10.1016/j.procs.2020.04.251. https://www.sciencedirect.com/science/article/pii/S1877050920312436. Special issue: Third International Conference on Computing and Network Communications (CoCoNet’19).
Injecting numerical reasoning skills into language models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 946--958, Online, July 2020a. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.89. https://aclanthology.org/2020.acl-main.89.
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies. Transactions of the Association for Computational Linguistics, 9:346--361, 04 2021. doi: 10.1162/tacla00370. https://doi.org/10.1162/tacl_a_00370.
Irony detection in a multilingual context. In Joemon M. Jose, Emine Yilmaz, João Magalhães, Pablo Castells, Nicola Ferro, Mário J. Silva, and Flávio Martins (eds.), Advances in Information Retrieval. ECIR 2020. Lecture Notes in Computer Science, volume 12036. Springer, Cham, 2020. https://link.springer.com/chapter/10.1007/978-3-030-45442-5_18.
Color naming across languages reflects color use. Proceedings of the National Academy of Sciences, 114(40):10785--10790, 2017. doi: 10.1073/pnas.1619666114. https://www.pnas.org/doi/abs/10.1073/pnas.1619666114.
Arthur S. Goldberger. Structural equation methods in the social sciences. Econometrica, 40(6):979--1001, 1972. http://www.jstor.org/stable/1913851.
Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 609--614, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1061. https://aclanthology.org/N19-1061.
Identifying sarcasm in Twitter: A closer look. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 581--586, Portland, Oregon, USA, June 2011. Association for Computational Linguistics. https://aclanthology.org/P11-2102.
Pragmatic language interpretation as probabilistic inference. Trends in Cognitive Sciences, 20:818--829, 2016. doi: 10.1016/j.tics.2016.08.005. https://doi.org/10.1016/j.tics.2016.08.005.
Topical-chat: Towards knowledge-grounded open-domain conversations. In Proc. Interspeech 2019, pp. 1891--1895, 2019. doi: 10.21437/Interspeech.2019-3079. https://www.isca-speech.org/archive/interspeech_2019/gopalakrishnan19_interspeech.html.
Are neural open-domain dialog systems robust to speech recognition errors in the dialog history? An empirical study. In Proc. Interspeech 2020, pp. 911--915, 2020. doi: 10.21437/Interspeech.2020-1508. https://www.isca-speech.org/archive/interspeech_2020/gopalakrishnan20_interspeech.html.
Andrew S. Gordon. Choice of plausible alternatives (COPA), 2010. https://people.ict.usc.edu/~gordon/copa.html.
English gigaword. Linguistic Data Consortium, Philadelphia, 4(1):34, 2003. doi: 10.35111/0z6y-q265. https://doi.org/10.35111/0z6y-q265.
Hybrid computing using a neural network with dynamic external memory. Nature, 538:471--–476, 2016. doi: 10.1038/nature20101. https://doi.org/10.1038/nature20101.
Progress report on program-understanding systems (AIM-240), 1974. http://infolab.stanford.edu/pub/cstr/reports/cs/tr/74/444/CS-TR-74-444.pdf.
Cordell Green. Application of theorem proving to problem solving. In Bonnie Lynn Webber and Nils J. Nilsson (eds.), Readings in Artificial Intelligence, pp. 202--222. Morgan Kaufmann, 1981. doi: https://doi.org/10.1016/B978-0-934613-03-3.50019-2. https://www.sciencedirect.com/science/article/pii/B9780934613033500192.
In defense of a dogma. The Philosophical Review, 65(2):141--158, 1956. http://www.jstor.org/stable/2182828.
Colorless green recurrent networks dream hierarchically. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1195--1205, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-1108. https://aclanthology.org/N18-1108.
Inductive programming meets the real world. Commun. ACM, 58(11):90–99, Oct. 2015. doi: 10.1145/2736282. https://doi.org/10.1145/2736282.
Program synthesis. Foundations and Trends in Programming Languages, 4(1–2):1--119, 2017a. doi: 10.1561/2500000010. http://dx.doi.org/10.1561/2500000010.
Program Synthesis. NOW, Boston, 2017b. https://www.microsoft.com/en-us/research/wp-content/uploads/2017/10/program_synthesis_now.pdf.
On calibration of modern neural networks. In Doina Precup and Yee Whye Teh (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 1321--1330. PMLR, 06--11 Aug. 2017. https://proceedings.mlr.press/v70/guo17a.html.
Disfl-QA: A benchmark dataset for understanding disfluencies in question answering. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 3309--3319, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.findings-acl.293. https://aclanthology.org/2021.findings-acl.293.
Samuel Gyasi Obeng. The proverb as a mitigating and politeness strategy in Akan discourse. Anthropological Linguistics, 38(3):521--549, 1996. http://www.jstor.org/stable/30028601.
The argument reasoning comprehension task: Identification and reconstruction of implicit warrants. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1930--1940, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-1175. https://aclanthology.org/N18-1175.
Michael Hahn. Theoretical limitations of self-attention in neural sequence models. Transactions of the Association for Computational Linguistics, 8:156--171, Dec. 2020. doi: 10.1162/tacla00306. https://doi.org/10.1162/tacl_a_00306.
It’s all in the name: Mitigating gender bias with name-based counterfactual data substitution. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5267--5275, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1530. https://aclanthology.org/D19-1530.
Maria Hanzén. When in Rome, do as the Romans do: Proverbs as a part of EFL teaching. Master’s thesis, Jönköping University, School of Education and Communication, Jönköping, 2007. http://www.diva-portal.org/smash/get/diva2:3499/fulltext01.pdf.
Francesca G.E. Happé. An advanced test of theory of mind: Understanding of story characters thoughts and feelings by able autistic, mentally handicapped, and normal children and adults. Journal of Autism and Developmental Disorders, 24:129--154, 1994. https://link.springer.com/article/10.1007/BF02172093.
The MovieLens datasets: History and context. ACM Trans. Interact. Intell. Syst., 5(4), Dec. 2015. doi: 10.1145/2827872. https://doi.org/10.1145/2827872.
Policy-driven neural response generation for knowledge-grounded dialog systems. In Proceedings of the 13th International Conference on Natural Language Generation, pp. 412--421, Dublin, Ireland, Dec. 2020. Association for Computational Linguistics. https://aclanthology.org/2020.inlg-1.46.
A survey on recent approaches for natural language processing in low-resource scenarios. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2545--2568, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.201. https://aclanthology.org/2021.naacl-main.201.
Women also snowboard: Overcoming bias in captioning models. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 771--787, Cham, 2018. Springer. https://www.ecva.net/papers/eccv_2018/papers_ECCV/papers/Lisa_Anne_Hendricks_Women_also_Snowboard_ECCV_2018_paper.pdf.
A baseline for detecting misclassified and out-of-distribution examples in neural networks. In International Conference on Learning Representations, 2017. https://openreview.net/forum?id=Hkg4TI9xl.
Using pre-training can improve model robustness and uncertainty. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 2712--2721. PMLR, 09--15 June 2019. https://proceedings.mlr.press/v97/hendrycks19a.html.
Measuring massive multitask language understanding. In International Conference on Learning Representations, 2021b. https://openreview.net/forum?id=d7KBjmI3GmQ.
The weirdest people in the world? Behavioral and Brain Sciences, 33(2-3):61–83, 2010. doi: 10.1017/S0140525X0999152X. https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/article/abs/weirdest-people-in-the-world/BF84F7517D56AFF7B7EB58411A554C17.
Teaching machines to read and comprehend. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015. https://proceedings.neurips.cc/paper/2015/hash/afdec7005cc9f14302cd0474fd0f3c96-Abstract.html.
TaPas: Weakly supervised table parsing via pre-training. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4320--4333, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.398. https://aclanthology.org/2020.acl-main.398.
Keith J. Holyoak. Analogy and relational reasoning. In Keith J. Holyoak and Robert G. Morrison (eds.), The Oxford Handbook of Thinking and Reasoning. Oxford University Press, Oxford, 2012. https://www.oxfordhandbooks.com/view/10.1093/oxfordhb/9780199734689.001.0001/oxfordhb-9780199734689-e-13.
Alexandra Horowitz. Smelling themselves: Dogs investigate their own odours longer when modified in an “olfactory mirror” test. Behavioural Processes, 143:17--24, 2017. doi: https://doi.org/10.1016/j.beproc.2017.08.001. https://www.sciencedirect.com/science/article/pii/S0376635717300104.
Learning to solve arithmetic word problems with verb categorization. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 523--533, Doha, Qatar, October 2014. Association for Computational Linguistics. doi: 10.3115/v1/D14-1058. https://aclanthology.org/D14-1058.
Yufang Hou. Bridging anaphora resolution as question answering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1428--1438, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.132. https://aclanthology.org/2020.acl-main.132.
Global inference for bridging anaphora resolution. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 907--917, Atlanta, Georgia, June 2013. Association for Computational Linguistics. https://aclanthology.org/N13-1111.
China Household Management Research Center, Ministry of Public Security. National name report 2018. 2019. http://news.cpd.com.cn/n18151/201901/t20190130_830962.html (Accessed 3 March 2021).
China Household Management Research Center, Ministry of Public Security. National name report 2019. 2020. https://www.mps.gov.cn/n2254314/n6409334/c6874817/content.html (Accessed 3 March 2021).
China Household Management Research Center, Ministry of Public Security. National name report 2020. 2021. https://www.mps.gov.cn/n2253534/n2253535/c7725981/content.html (Accessed 3 March 2021).
Introduction to Paremiology: A Comprehensive Guide to Proverb Studies. De Gruyter Open, Warsaw, 2015. https://www.degruyter.com/document/doi/10.2478/9783110410167/html.
Lexical semantic relatedness with random graph walks. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 581--589, Prague, Czech Republic, June 2007. Association for Computational Linguistics. https://aclanthology.org/D07-1061.
Compositionality decomposed: How do neural networks generalise? Journal of Artificial Intelligence Research, 67:757--795, 2020. doi: 10.1613/jair.1.11674. https://doi.org/10.1613/jair.1.11674.
Can self-awareness be taught? Monkeys pass the mirror test -- again. Proceedings of the National Academy of Sciences, 114(13):3281--3283, 2017. doi: 10.1073/pnas.1701676114. https://www.pnas.org/doi/abs/10.1073/pnas.1701676114.
OpenRefine. https://openrefine.org/
Instagram Engineering. Emojineering part 1: Machine learning for emoji trends. Medium, 1 May 2015. https://instagram-engineering.com/emojineering-part-1-machine-learning-for-emoji-trendsmachine-learning-for-emoji-trends-7f5f9cb979ad.
Automatic detection of generated text is easiest when humans are fooled. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1808--1822, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.164. https://aclanthology.org/2020.acl-main.164.
Learning to execute instructions in a Minecraft dialogue. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2589--2602, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.232. https://aclanthology.org/2020.acl-main.232.
Are natural language inference models IMPPRESsive? Learning IMPlicature and PRESupposition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8690--8705, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.768. https://aclanthology.org/2020.acl-main.768.
Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of the 10th Research on Computational Linguistics International Conference, pp. 19--33, Taipei, Taiwan, August 1997. The Association for Computational Linguistics and Chinese Language Processing (ACLCLP). https://aclanthology.org/O97-1002.
Do you know that Florence is packed with visitors? Evaluating state-of-the-art models of speaker commitment. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4208--4213, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1412. https://aclanthology.org/P19-1412.
How can we know what language models know? Transactions of the Association for Computational Linguistics, 8:423--438, 2020. doi: 10.1162/tacla00324. https://doi.org/10.1162/tacl_a_00324.
Harnessing context incongruity for sarcasm detection. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 757--762, Beijing, China, July 2015. Association for Computational Linguistics. doi: 10.3115/v1/P15-2124. https://aclanthology.org/P15-2124.
Automatic sarcasm detection: A survey. ACM Comput. Surv., 50(5), Sep. 2017. doi: 10.1145/3124420. https://doi.org/10.1145/3124420.
Template guided text generation for task-oriented dialogue. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6505--6520, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.527. https://aclanthology.org/2020.emnlp-main.527.
Immanuel Kant. Critique of Pure Reason. The Cambridge Edition of the Works of Immanuel Kant, edited by Paul Guyer and Allen W. Wood. Cambridge University Press, 1781/1787. doi: 10.1017/CBO9780511804649. https://doi.org/10.1017/CBO9780511804649.
Immanuel Kant. Prolegomena to Any Future Metaphysics. Cambridge Texts in the History of Philosophy, edited by Gary Hatfield. Cambridge University Press, 2nd edition, 1783. doi: 10.1017/CBO9780511808517. https://doi.org/10.1017/CBO9780511808517.
Andrej Karpathy. The unreasonable effectiveness of recurrent neural networks. Andrej Karpathy’s blog, 21 May 2015. http://karpathy.github.io/2015/05/21/rnn-effectiveness/.
Lauri Karttunen. Simple and phrasal implicatives. In *SEM 2012: The First Joint Conference on Lexical and Computational Semantics -- Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pp. 124--131, Montréal, Canada, 7-8 June 2012. Association for Computational Linguistics. https://aclanthology.org/S12-1020.
Os Keyes. The misgendering machines: Trans/HCI implications of automatic gender recognition. In Proceedings of the ACM on human-computer interaction, volume 2, New York, NY, USA, Nov. 2018. Association for Computing Machinery. doi: 10.1145/3274357. https://doi.org/10.1145/3274357.
How do humans teach: On curriculum learning and teaching dimension. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K.Q. Weinberger (eds.), Advances in Neural Information Processing Systems, volume 24. Curran Associates, Inc., 2011. https://proceedings.neurips.cc/paper/2011/file/f9028faec74be6ec9b852b0a542e2f39-Paper.pdf.
UNIFIEDQA: Crossing format boundaries with a single QA system. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 1896--1907, Online, November 2020b. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.171. https://aclanthology.org/2020.findings-emnlp.171.
Dynabench: Rethinking benchmarking in NLP. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4110--4124, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.324. https://aclanthology.org/2021.naacl-main.324.
Cooperation and codenames: Understanding natural language processing via codenames. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, volume 15, pp. 160--166, Menlo Park, CA, Oct. 2019. Association for the Advancement of Artificial Intelligence. https://ojs.aaai.org/index.php/AIIDE/article/view/5239.
Evaluating approaches to personalizing language models. In Proceedings of the 12th Language Resources and Evaluation Conference, pp. 2461--2469, Marseille, France, May 2020. European Language Resources Association. https://aclanthology.org/2020.lrec-1.299.
Recurrent Neural Networks in Linguistic Theory: Revisiting Pinker and Prince (1988) and the Past Tense Debate. Transactions of the Association for Computational Linguistics, 6:651--665, 12 2018. ISSN 2307-387X. doi: 10.1162/tacla00247. https://doi.org/10.1162/tacl_a_00247.
Emanuel Kitzelmann. Inductive programming: A survey of program synthesis techniques. In Ute Schmid, Emanuel Kitzelmann, and Rinus Plasmeijer (eds.), Approaches and Applications of Inductive Programming, pp. 50--73, Berlin, 2010. Springer. doi: 10.1007/978-3-642-11931-6. https://doi.org/10.1007/978-3-642-11931-6.
A surprisingly robust trick for the Winograd schema challenge. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4837--4842, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1478. https://aclanthology.org/P19-1478.
The NarrativeQA reading comprehension challenge. Transactions of the Association for Computational Linguistics, 6:317--328, 2018. doi: 10.1162/tacla00023. https://aclanthology.org/Q18-1023.
MultiEmo: Multilingual, multilevel, multidomain sentiment analysis corpus of consumer reviews. In Maciej Paszynski, Dieter Kranzlmüller, Valeria V. Krzhizhanovskaya, Jack J. Dongarra, and Peter M. A. Sloot (eds.), Computational Science -- ICCS 2021, pp. 297--312, Cham, 2021. Springer. doi: 10.1007/978-3-030-77964-124. https://doi.org/10.1007/978-3-030-77964-124.
Counterlogicals as counterconventionals. Journal of Philosophical Logic, 50:673--704, 2021. doi: 10.1007/s10992-020-09581-6. https://doi.org/10.1007/s10992-020-09581-6.
Self-Aware Computing Systems. Springer, Cham, 2017. https://link.springer.com/book/10.1007/978-3-319-47474-8.
All the news that’s fit to fabricate: AI-generated text as a tool of media misinformation. SSRN, 24 Sep 2020. doi: 10.2139/ssrn.3525002. http://dx.doi.org/10.2139/ssrn.3525002.
Quality at a glance: An audit of web-crawled multilingual datasets. Transactions of the Association for Computational Linguistics, 10:50--72, 01 2022. doi: 10.1162/tacla00447. https://doi.org/10.1162/tacl_a_00447.
Evaluating the factual consistency of abstractive text summarization. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 9332--9346, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.750. https://aclanthology.org/2020.emnlp-main.750.
SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 66--71, Brussels, Belgium, November 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-2012. https://aclanthology.org/D18-2012.
Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:453--466, 08 2019. doi: 10.1162/tacla00276. https://doi.org/10.1162/tacl_a_00276.
Kevin Lacker. Giving GPT-3 a Turing test. Kevin Lacker’s blog, July 2020. https://lacker.io/ai/2020/07/06/giving-gpt-3-a-turing-test.html.
RACE: Large-scale ReAding comprehension dataset from examinations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 785--794, Copenhagen, Denmark, September 2017. Association for Computational Linguistics. doi: 10.18653/v1/D17-1082. https://aclanthology.org/D17-1082.
Building machines that learn and think like people. Behavioral and Brain Sciences, 40:e253, 2017. doi: 10.1017/S0140525X16001837. https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/article/building-machines-that-learn-and-think-like-people/A9535B1D745A0377E16C590E14B94993.
The emergence of number and syntax units in LSTM language models. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 11--20, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1002. https://aclanthology.org/N19-1002.
Revisiting the evaluation of theory of mind through question answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5872--5877, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1598. https://aclanthology.org/D19-1598.
Language models as fact checkers? In Proceedings of the Third Workshop on Fact Extraction and VERification (FEVER), pp. 36--41, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.fever-1.5. https://aclanthology.org/2020.fever-1.5.
Towards few-shot fact-checking via perplexity. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1971--1981, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.158. https://aclanthology.org/2021.naacl-main.158.
Solving logic puzzles: From robust processing to precise semantics. In Proceedings of the 2nd Workshop on Text Meaning and Interpretation, pp. 9--16, Barcelona, Spain, July 2004. Association for Computational Linguistics. https://aclanthology.org/W04-0902.
Zero-shot relation extraction via reading comprehension. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pp. 333--342, Vancouver, Canada, August 2017. Association for Computational Linguistics. doi: 10.18653/v1/K17-1034. https://aclanthology.org/K17-1034.
TR9856: A multi-word term relatedness benchmark. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 419--424, Beijing, China, July 2015. Association for Computational Linguistics. doi: 10.3115/v1/P15-2069. https://aclanthology.org/P15-2069.
MLQA: Evaluating cross-lingual extractive question answering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7315--7330, Online, July 2020a. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.653. https://aclanthology.org/2020.acl-main.653.
UNQOVERing stereotyping biases via underspecified questions. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 3475--3489, Online, November 2020a. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.311. https://aclanthology.org/2020.findings-emnlp.311.
DailyDialog: A manually labelled multi-turn dialogue dataset. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 986--995, Taipei, Taiwan, November 2017. Asian Federation of Natural Language Processing. https://aclanthology.org/I17-1099.
DELPHI: Accurate deep ensemble model for protein interaction sites prediction. Bioinformatics, 37(7):896--904, 08 2020b. doi: 10.1093/bioinformatics/btaa750. https://doi.org/10.1093/bioinformatics/btaa750.
A meaning-based statistical English math word problem solver. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 652--662, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-1060. https://aclanthology.org/N18-1060.
Towards debiasing sentence representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5502--5515, Online, July 2020a. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.488. https://aclanthology.org/2020.acl-main.488.
Learning to contrast the counterfactual samples for robust visual question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 3285--3292, Online, November 2020b. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.265. https://aclanthology.org/2020.emnlp-main.265.
Birds have four legs?! NumerSense: probing numerical commonsense knowledge of pre-trained language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6862--6868, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.557. https://aclanthology.org/2020.emnlp-main.557.
Reasoning over paragraph effects in situations. In Proceedings of the 2nd Workshop on Machine Reading for Question Answering, pp. 58--62, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-5808. https://aclanthology.org/D19-5808.
Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Transactions of the Association for Computational Linguistics, 4:521--535, 12 2016. doi: 10.1162/tacla00115. https://doi.org/10.1162/tacl_a_00115.
SemEval-2015 task 5: QA TempEval - evaluating temporal information understanding with question answering. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 792--800, Denver, Colorado, June 2015. Association for Computational Linguistics. doi: 10.18653/v1/S15-2134. https://aclanthology.org/S15-2134.
Gender bias in neural natural language processing. In Vivek Nigam, Tajana Ban Kirigin, Carolyn Talcott, Joshua Guttman, Stepan Kuznetsov, Boon Thau Loo, and Mitsuhiro Okada (eds.), Logic, Language, and Security. Springer, Cham, 2020. https://www.springerprofessional.de/en/gender-bias-in-neural-natural-language-processing/18531692.
What’s in the box? An analysis of undesirable content in the Common Crawl corpus. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 182--189, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-short.24. https://aclanthology.org/2021.acl-short.24.
A survey of reinforcement learning informed by natural language. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pp. 6309–6317, 2019. https://www.ijcai.org/proceedings/2019/0880.pdf.
Automatic prediction of discourse connectives. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, May 2018. European Language Resources Association. https://aclanthology.org/L18-1260.
A BERT-based approach for automatic humor detection and scoring. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019), pp. 197--202, 2019. http://ceur-ws.org/Vol-2421/HAHA_paper_8.pdf.
GPT-3, bloviator: OpenAI’s language generator has no idea what it’s talking about. MIT Technology Review, 22 August 2020. https://www.technologyreview.com/2020/08/22/1007539/gpt3-openai-language-generator-artificial-intelligence-ai-opinion/.
The Penn Treebank: Annotating predicate argument structure. In Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994, 1994. https://aclanthology.org/H94-1020.
Collective classification for fine-grained information status. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 795--804, Jeju Island, Korea, July 2012. Association for Computational Linguistics. https://aclanthology.org/P12-1084.
Inclusive data visualization for people with disabilities: A call to action. Interactions, 28(3):47–51, Apr. 2021. doi: 10.1145/3457875. https://doi.org/10.1145/3457875.
Research community dynamics behind popular AI benchmarks. Nature Machine Intelligence, 3(7):581--589, 2021. doi: 10.1038/s42256-021-00339-6. https://doi.org/10.1038/s42256-021-00339-6.
Targeted syntactic evaluation of language models. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1192--1202, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-1151. https://aclanthology.org/D18-1151.
Suicide risk assessment with multi-level dual-context language and BERT. In Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology, pp. 39--44, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/W19-3005. https://aclanthology.org/W19-3005.
On measuring social biases in sentence encoders. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 622--628, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1063. https://aclanthology.org/N19-1063.
Andrew Mayne. OpenAI API alchemy: Emoji storytelling. Andrew Mayne blog, 24 June 2020. https://andrewmayneblog.wordpress.com/2020/06/24/open-ai-alchemy-emoji-storytelling/.
Context based spelling correction. Information Processing & Management, 27(5):517--522, 1991. doi: https://doi.org/10.1016/0306-4573(91)90066-U. https://www.sciencedirect.com/science/article/pii/030645739190066U.
The application of convolution neural network based cell segmentation during cryopreservation. Cryobiology, 85:95--104, 2018. doi: https://doi.org/10.1016/j.cryobiol.2018.09.003. https://www.sciencedirect.com/science/article/pii/S0011224018301937.
Does Syntax Need to Grow on Trees? Sources of Hierarchical Inductive Bias in Sequence-to-Sequence Networks. Transactions of the Association for Computational Linguistics, 8:125--140, 01 2020. ISSN 2307-387X. doi: 10.1162/tacla00304. https://doi.org/10.1162/tacl_a_00304.
USR: An unsupervised and reference free evaluation metric for dialog generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 681--707, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.64. https://aclanthology.org/2020.acl-main.64.
Christine Palm Meister. Phraseologie des schwedischen. In H. Burger et al. (ed.), Phraseologie/Phrasology, volume 2, pp. 673--681. De Gruyter Mouton, 2007. doi: 10.1515/9783110190762.673. https://doi.org/10.1515/9783110190762.673.
Interactive optimal teaching with unknown learners. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pp. 2567--2573, 2018. doi: 10.24963/ijcai.2018/356. https://doi.org/10.24963/ijcai.2018/356.
Temporal information extraction for question answering using syntactic dependencies in an LSTM-based architecture. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 887--896, Copenhagen, Denmark, September 2017. Association for Computational Linguistics. doi: 10.18653/v1/D17-1092. https://aclanthology.org/D17-1092.
Wolfgang Mieder. "Andere zeiten, andere lehren": Sprach-und kulturgeschichtliche betrachtungen zum sprichwort. In K. Steyer (ed.), Wortverbindungen - mehr oder weniger fest, pp. 415--438. De Gruyter, Berlin, 2019. doi: 10.1515/9783110622768-020. https://doi.org/10.1515/9783110622768-020.
Making computers laugh: Investigations in automatic humor recognition. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp. 531--538, Vancouver, British Columbia, Canada, October 2005. Association for Computational Linguistics. https://aclanthology.org/H05-1067.
The effect of natural distribution shift on question answering models. In Hal Daumé III and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp. 6905--6916. PMLR, 13--18 July 2020. https://proceedings.mlr.press/v119/miller20a.html.
Automatic disambiguation of English puns. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 719--729, Beijing, China, July 2015. Association for Computational Linguistics. doi: 10.3115/v1/P15-1070. https://aclanthology.org/P15-1070.
SemEval-2017 task 7: Detection and interpretation of English puns. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 58--68, Vancouver, Canada, August 2017. Association for Computational Linguistics. doi: 10.18653/v1/S17-2005. https://aclanthology.org/S17-2005.
An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pp. 25–30, Menlo Park, 2008. Association for the Advancement of Artificial Intelligence. https://www.aaai.org/Papers/Workshops/2008/WS-08-15/WS08-15-005.pdf.
Republic of China Ministry of the Interior. National name statistical analysis, 2018. https://www.ris.gov.tw/documents/data/5/2/107namestat.pdf (Accessed 3 March 2021).
Natural reference to objects in a visual domain. In Proceedings of the 6th International Natural Language Generation Conference. Association for Computational Linguistics, July 2010. https://aclanthology.org/W10-4210.
Generating expressions that refer to visible objects. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1174--1184, Atlanta, Georgia, June 2013. Association for Computational Linguistics. https://aclanthology.org/N13-1137.
CLaC at CLPsych 2019: Fusion of neural features and predicted class probabilities for suicide risk assessment based on online posts. In Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology, pp. 34--38, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/W19-3004. https://aclanthology.org/W19-3004.
Introducing the LCC metaphor datasets. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pp. 4221--4227, Portorož, Slovenia, May 2016. European Language Resources Association. https://aclanthology.org/L16-1668.
Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 17(1):21--48, 1991. https://aclanthology.org/J91-1002.
Structure here, bias there: Hierarchical generalization by jointly learning syntactic transformations. In Proceedings of the Society for Computation in Linguistics 2021, pp. 125--135, Online, February 2021. Association for Computational Linguistics. https://aclanthology.org/2021.scil-1.12.
Applying the naïve Bayes classifier with kernel density estimation to the prediction of protein–protein interaction sites. Bioinformatics, 26(15):1841--1848, 06 2010. doi: 10.1093/bioinformatics/btq302. https://doi.org/10.1093/bioinformatics/btq302.
Gregory L. Murphy. Comprehending complex concepts. Cognitive Science, 12(4):529--562, 1988. doi: https://doi.org/10.1207/s15516709cog1204_2. https://onlinelibrary.wiley.com/doi/abs/10.1207/s15516709cog1204_2.
Obtaining well calibrated probabilities using Bayesian binning. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 2901--2907, Menlo Park, CA, 2015. Association for the Advancement of Artificial Intelligence. https://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9667.
Don’t give me the details, just the summary! Topic-aware convolutional neural networks for extreme summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1797--1807, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-1206. https://aclanthology.org/D18-1206.
Participatory research for low-resourced machine translation: A case study in African languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 2144--2160, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.195. https://aclanthology.org/2020.findings-emnlp.195.
Evaluating theory of mind in question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2392--2400, Brussels, Belgium, October--November 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-1261. https://aclanthology.org/D18-1261.
Posterior calibration and exploratory analysis for natural language processing models. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1587--1598, Lisbon, Portugal, September 2015. Association for Computational Linguistics. doi: 10.18653/v1/D15-1182. https://aclanthology.org/D15-1182.
DisSent: Learning sentence representations from explicit discourse relations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4497--4510, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1442. https://aclanthology.org/P19-1442.
Proverb comprehension as a function of reading proficiency in preadolescents. Language Speech and Hearing Services in Schools, 32:90, 04 2001. doi: 10.1044/0161-1461(2001/009). https://www.researchgate.net/publication/285246680_Proverb_Comprehension_as_a_Function_of_Reading_Proficiency_in_Preadolescents.
Generating natural anagrams: Towards language generation under hard combinatorial constraints. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 6408--6412, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1674. https://aclanthology.org/D19-1674.
"The things that we have to do": Ethics and instrumentality in humanitarian communication. Global Media and Communication, 9(1):53--70, 2013. doi: 10.1177/1742766512463040. https://doi.org/10.1177/1742766512463040.
Effects of directionality in deductive reasoning, I. The comprehension of single relational premises. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26(6):1702--1712, 2000. doi: 10.1037/0278-7393.26.6.1702. https://doi.org/10.1037/0278-7393.26.6.1702.
Effects of directionality in deductive reasoning, II. Premise integration and conclusion evaluation. The Quarterly Journal of Experimental Psychology Section A, 58(7):1225--1247, 2005. doi: 10.1080/02724980443000566. https://doi.org/10.1080/02724980443000566.
The Working Committee on the Revision of the National Standard Occupational Classification. Standard Occupational Classification of the People’s Republic of China. China Labour and Social Security Publishing House, 2015. http://www.jiangmen.gov.cn/bmpd/jmsrlzyhshbzj/zwfw/bmjd/jdks/content/post_2334804.html (Accessed 4 June 2022).
iSarcasm: A dataset of intended sarcasm. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1279--1289, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.118. https://aclanthology.org/2020.acl-main.118.
Can you trust your model’s uncertainty? Evaluating predictive uncertainty under dataset shift. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. https://proceedings.neurips.cc/paper/2019/hash/8558cb408c1d76621371888657d2eb1d-Abstract.html.
Understanding factuality in abstractive summarization with FRANK: A benchmark for factuality metrics. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4812--4829, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.383. https://aclanthology.org/2021.naacl-main.383.
Are NLP models really able to solve simple math word problems? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2080--2094, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.168. https://aclanthology.org/2021.naacl-main.168.
Anthony M. Paul. Figurative language. Philosophy & Rhetoric, 3(4):225--248, 1970. http://www.jstor.org/stable/40237206.
Don’t patronize me! An annotated dataset with patronizing and condescending language towards vulnerable communities. In Proceedings of the 28th International Conference on Computational Linguistics, pp. 5891--5902, Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics. doi: 10.18653/v1/2020.coling-main.518. https://aclanthology.org/2020.coling-main.518.
Data cleaning: A case study with OpenRefine and Trifacta Wrangler. In Martin Shepperd, Fernando Brito e Abreu, Alberto Rodrigues da Silva, and Ricardo Pérez-Castillo (eds.), Quality of Information and Communications Technology, pp. 32--40, Cham, 2020. Springer. doi: 10.1007/978-3-030-58793-23. https://doi.org/10.1007/978-3-030-58793-23.
Steve Piantadosi. Fleet system, 2020. https://github.com/piantado/Fleet.
Robert Plutchik. A general psychoevolutionary theory of emotion. In Robert Plutchik and Henry Kellerman (eds.), Theories of Emotion, pp. 3--33. Academic Press, 1980. doi: https://doi.org/10.1016/B978-0-12-558701-3.50007-7. https://www.sciencedirect.com/science/article/pii/B9780125587013500077.
Knowledge derived from Wikipedia for computing semantic relatedness. Journal of Artificial Intelligence Research, 30:181--212, 2007. https://jair.org/index.php/jair/article/view/10513.
SemEval 2015, task 7: Diachronic text evaluation. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 870--878, Denver, Colorado, June 2015. Association for Computational Linguistics. doi: 10.18653/v1/S15-2147. https://aclanthology.org/S15-2147.
MELD: A multimodal multi-party dataset for emotion recognition in conversations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 527--536, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1050. https://aclanthology.org/P19-1050.
A transformer-based approach to irony and sarcasm detection. Neural Computing and Applications, 32:17309--17320, 2020. doi: 10.1007/s00521-020-05102-3. https://link.springer.com/article/10.1007/s00521-020-05102-3.
Joking riddles: A developmental index of children’s humor. Developmental Psychology, 11:210--216, 1975. doi: 10.1037/h0076455. https://doi.org/10.1037/h0076455.
The specification language TimeML, 2004. http://xml.coverpages.org/TimeML-SpecLang200401.pdf.
Qimingtong. What are the most popular names chinese parents give their babies? a perspective from big data. 2016. https://www.qimingtong.com/article/0 (Accessed 3 March 2021).
TIMEDIAL: Temporal commonsense reasoning in dialog. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 7066--7076, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.549. https://aclanthology.org/2021.acl-long.549.
Willard V.O. Quine. Main trends in recent philosophy: Two dogmas of empiricism. The Philosophical Review, 60(1):20--43, 1951. http://www.jstor.org/stable/2181906.
The North American computational linguistics olympiad (NACLO). In Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics, pp. 87--96, Columbus, Ohio, June 2008. Association for Computational Linguistics. https://aclanthology.org/W08-0211.
Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019. https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf.
Word sense disambiguation: A unified evaluation framework and empirical comparison. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 99--110, Valencia, Spain, April 2017. Association for Computational Linguistics. https://aclanthology.org/E17-1010.
Resolving complex cases of definite pronouns: The Winograd schema challenge. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 777--789, Jeju Island, Korea, July 2012. Association for Computational Linguistics. https://aclanthology.org/D12-1071.
A survey on computational metaphor processing. ACM Comput. Surv., 53(2), mar 2020. doi: 10.1145/3373265. https://doi.org/10.1145/3373265.
SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2383--2392, Austin, Texas, November 2016. Association for Computational Linguistics. doi: 10.18653/v1/D16-1264. https://aclanthology.org/D16-1264.
Know what you don’t know: Unanswerable questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 784--789, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi: 10.18653/v1/P18-2124. https://aclanthology.org/P18-2124.
Med-BERT: Pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. npj Digital Medicine, 4(1):86, 2021. doi: 10.1038/s41746-021-00455-y. https://doi.org/10.1038/s41746-021-00455-y.
Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp. 8689--8696, Menlo Park, CA, Apr. 2020. Association for the Advancement of Artificial Intelligence. doi: 10.1609/aaai.v34i05.6394. https://ojs.aaai.org/index.php/AAAI/article/view/6394.
CoQA: A conversational question answering challenge. Transactions of the Association for Computational Linguistics, 7:249--266, 2019. doi: 10.1162/tacla00266. https://aclanthology.org/Q19-1016.
Multi-prototype vector-space models of word meaning. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 109--117, Los Angeles, California, June 2010. Association for Computational Linguistics. https://aclanthology.org/N10-1013.
He Ren and Quan Yang. Neural joke generation, 2017. https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1174/reports/2760332.pdf.
Philip Resnik. Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research, 11:95--130, 1999. doi: 10.1613/jair.514. https://doi.org/10.1613/jair.514.
Choice of plausible alternatives: An evaluation of commonsense causal reasoning. AAAI Spring Symposium, 2011. http://commonsensereasoning.org/2011/papers/Roemmele.pdf.
How well do NLI models capture verb veridicality? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2230--2240, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1228. https://aclanthology.org/D19-1228.
XTREME-R: Towards more challenging and nuanced multilingual evaluation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 10215--10245, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.802. https://aclanthology.org/2021.emnlp-main.802.
Gender bias in coreference resolution. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp. 8--14, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-2002. https://aclanthology.org/N18-2002.
Comparing conventions. In Joseph Rhyne et al. (ed.), Proceedings of Semantics and Linguistic Theory, pp. 294--313, Washington, D.C., 2020. Linguistic Society of America. doi: 10.3765/salt.v30i0.4820. https://doi.org/10.3765/salt.v30i0.4820.
Number-space mapping in the newborn chick resembles humans’ mental number line. Science, 347(6221):534--536, 2015. doi: 10.1126/science.aaa1379. https://www.science.org/doi/abs/10.1126/science.aaa1379.
Joshua S. Rule. The child as hacker: Building more human-like models of learning. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, 2020. https://hdl.handle.net/1721.1/129232.
The child as hacker. Trends in Cognitive Sciences, 24(11):900--915, 2020. doi: https://doi.org/10.1016/j.tics.2020.07.005. https://www.sciencedirect.com/science/article/pii/S1364661320301741.
A neural attention model for abstractive sentence summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 379--389, Lisbon, Portugal, September 2015. Association for Computational Linguistics. doi: 10.18653/v1/D15-1044. https://aclanthology.org/D15-1044.
Artificial Intelligence: A Modern Approach. Pearson, Hoboken, 2002. http://aima.cs.berkeley.edu/.
PuzzLing Machines: A challenge on learning from small data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1241--1254, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.115. https://aclanthology.org/2020.acl-main.115.
WINOGRANDE: An adversarial Winograd schema challenge at scale. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI-20, pp. 8732--8734, Menlo Park, CA, 2020. Association for the Advancement of Artificial Intelligence. https://ojs.aaai.org/index.php/AAAI/article/view/6399/6255.
Automatic detection of satire in Twitter: A psycholinguistic-based approach. Knowledge-Based Systems, 128:20--33, 2017. doi: https://doi.org/10.1016/j.knosys.2017.04.009. https://www.sciencedirect.com/science/article/pii/S0950705117301855.
Masked language model scoring. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2699--2712, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.240. https://aclanthology.org/2020.acl-main.240.
Temporal reasoning in natural language processing: A survey. International Journal of Computer Applications, 1(4):53--57, 2010. https://www.ijcaonline.org/journal/number4/pxc387209.pdf.
Evan Sandhaus. The New York Times annotated corpus LDC2008T19. Linguistic Data Consortium, 2008. https://catalog.ldc.upenn.edu/LDC2008T19.
Multitask prompted training enables zero-shot task generalization. In International Conference on Learning Representations, 2022. https://openreview.net/forum?id=9Vrb9D0WI4.
Social IQa: Commonsense reasoning about social interactions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 4463--4473, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1454. https://aclanthology.org/D19-1454.
Social bias frames: Reasoning about social and power implications of language. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5477--5490, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.486. https://aclanthology.org/2020.acl-main.486.
Get your vitamin C! Robust fact verification with contrastive evidence. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 624--643, Online, June 2021a. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.52. https://aclanthology.org/2021.naacl-main.52.
Programming puzzles. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), 2021b. https://openreview.net/forum?id=fe_hCc4RBrg.
Megan Scudellari. Cryopreservation aims to engineer novel ways to freeze, store, and thaw organs. Proceedings of the National Academy of Sciences, 114(50):13060--13062, 2017. doi: 10.1073/pnas.1717588114. https://www.pnas.org/doi/abs/10.1073/pnas.1717588114.
Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1073--1083, Vancouver, Canada, July 2017. Association for Computational Linguistics. doi: 10.18653/v1/P17-1099. https://www.aclweb.org/anthology/P17-1099.
Revisiting low-resource neural machine translation: A case study. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 211--221, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1021. https://aclanthology.org/P19-1021.
Diagram understanding in geometry questions. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 28, Menlo Park, CA, Jun. 2014. Association for the Advancement of Artificial Intelligence. https://ojs.aaai.org/index.php/AAAI/article/view/9146.
Counterfactual learning in networks: An empirical study of model dependence, 2021. https://www.cs.uic.edu/~elena/pubs/shahid-why19.pdf.
Janelle Shane. All your questions answered. AI Weirdness, 20 June 2020. https://www.aiweirdness.com/all-your-questions-answered-20-06-17/.
Inferring LISP programs from examples. In IJCAI’75: Proceedings of the 4th International Joint Conference on Artificial Intelligence, volume 1, pp. 260--267. Artificial Intelligence Laboratory, Cambridge, MA, 1975. doi: 10.7916/D89K4K6X. https://academiccommons.columbia.edu/doi/10.7916/D89K4K6X.
The woman worked as a babysitter: On biases in language generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3407--3412, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1339. https://aclanthology.org/D19-1339.
Expert, crowdsourced, and machine assessment of suicide risk via online postings. In Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, pp. 25--36, New Orleans, LA, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/W18-0603. https://aclanthology.org/W18-0603.
Abu Awal Md Shoeb and Gerard de Melo. EmoTag1200: Understanding the association between emojis and emotions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 8957--8967, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.720. https://aclanthology.org/2020.emnlp-main.720.
Ekaterina Shutova. Automatic metaphor interpretation as a paraphrasing task. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 1029--1037, Los Angeles, California, June 2010. Association for Computational Linguistics. https://aclanthology.org/N10-1147.
Metaphor corpus annotated for source-target domain mappings. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), Valletta, Malta, May 2010. European Language Resources Association. http://www.lrec-conf.org/proceedings/lrec2010/pdf/612_Paper.pdf.
Mining discourse markers for unsupervised sentence representation learning. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 3477--3486, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1351. https://aclanthology.org/N19-1351.
DiscSense: Automated semantic analysis of discourse markers. In Proceedings of the 12th Language Resources and Evaluation Conference, pp. 991--999, Marseille, France, May 2020. European Language Resources Association. https://aclanthology.org/2020.lrec-1.125.
Zero-shot recommendation as language modeling. In Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (eds.), Advances in Information Retrieval: 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part II, pp. 223–230, Cham, 2022. Springer. doi: 10.1007/978-3-030-99739-726. https://doi.org/10.1007/978-3-030-99739-726.
SPRINGS: Prediction of protein-protein interaction sites using artificial neural networks. Journal of Proteomics & Computational Biology, 1:7, 2014. https://www.avensonline.org/fulltextarticles/JPCB-2572-8679-01-0001.html.
Predicting a correct program in programming by example. In Daniel Kroening and Corina S. Păsăreanu (eds.), Computer Aided Verification, pp. 398--414, Cham, 2015. Springer International Publishing. doi: 10.1007/978-3-319-21690-423. https://doi.org/10.1007/978-3-319-21690-423.
COM2SENSE: A commonsense reasoning benchmark with complementary sentences. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 883--898, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.findings-acl.78. https://aclanthology.org/2021.findings-acl.78.
Closing brackets with recurrent neural networks. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp. 232--239, Brussels, Belgium, November 2018. Association for Computational Linguistics. doi: 10.18653/v1/W18-5425. https://aclanthology.org/W18-5425.
Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631--1642, Seattle, Washington, USA, October 2013. Association for Computational Linguistics. https://aclanthology.org/D13-1170.
Early detection of freeze damage in navel orange fruit using nondestructive low intensity ultrasound coupled with machine learning. Food Analytical Methods, 14:1140--1149, 2021. https://doi.org/10.1007/s12161-020-01942-w.
Evaluating gender bias in machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1679--1684, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1164. https://aclanthology.org/P19-1164.
Neural networks -- a model of boolean functions. 5th International Workshop on Boolean Problems, Freiburg, Sept. 2002., 2002. https://www.researchgate.net/publication/246931125_Neural_Networks_-_A_Model_of_Boolean_Functions.
Andreas Stöckl. Watching a language model learning chess. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pp. 1369--1379, Held Online, September 2021. INCOMA Ltd. https://aclanthology.org/2021.ranlp-1.153.
Prerequisite skills for reading comprehension: Multi-perspective analysis of MCTest datasets and systems. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, Menlo Park, CA, Feb. 2017. Association for the Advancement of Artificial Intelligence. https://ojs.aaai.org/index.php/AAAI/article/view/10957.
Executing instructions in situated collaborative interactions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2119--2130, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1218. https://aclanthology.org/D19-1218.
Evolution and impact of bias in human and machine learning algorithm interaction. PLOS ONE, 15(8):1--39, 08 2020. doi: 10.1371/journal.pone.0235502. https://doi.org/10.1371/journal.pone.0235502.
Rich Sutton. The bitter lesson. Incomplete Ideas, 2019. http://www.incompleteideas.net/IncIdeas/BitterLesson.html.
LSTM networks can perform dynamic counting. In Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges, pp. 44--54, Florence, August 2019a. Association for Computational Linguistics. doi: 10.18653/v1/W19-3905. https://aclanthology.org/W19-3905.
ChePT -- applying deep neural transformer models to chess move prediction and self-commentary, 2021. https://web.stanford.edu/class/cs224n/reports/final_reports/report087.pdf.
You reap what you sow: On the challenges of bias evaluation under multilingual settings. In Proceedings of the ACL Workshop on Challenges & Perspectives in Creating LLMs, 2022. https://openreview.net/forum?id=rK-7NhfSIW5.
CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4149--4158, Minneapolis, Minnesota, June 2019b. Association for Computational Linguistics. doi: 10.18653/v1/N19-1421. https://aclanthology.org/N19-1421.
The teaching size: Computable teachers and learners for universal languages. Machine Learning, 108:1653--1675, 2019. doi: 10.1007/s10994-019-05821-2. https://doi.org/10.1007/s10994-019-05821-2.
Analog retrieval by constraint satisfaction. Artificial Intelligence, 46(3):259--310, 1990. doi: https://doi.org/10.1016/0004-3702(90)90018-U. https://www.sciencedirect.com/science/article/pii/000437029090018U.
Representing numbers in NLP: A survey and a vision. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 644--656, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.53. https://aclanthology.org/2021.naacl-main.53.
Learning to interpret natural language commands through human-robot dialog. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI-15, pp. 1923--1929, 2015. https://www.aaai.org/ocs/index.php/IJCAI/IJCAI15/paper/view/10957/10931.
Judith Jarvis Thomson. Killing, letting die, and the trolley problem. The Monist, 59(2):204--217, 1976. doi: 10.5840/monist197659224. https://doi.org/10.5840/monist197659224.
Survey on collaborative filtering, content-based filtering and hybrid recommendation system. International Journal of Computer Applications, 110:31--36, 2015. doi: 10.5120/19308-0760. https://www.ijcaonline.org/archives/volume110/number4/19308-0760.
FEVER: A large-scale dataset for fact extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 809--819, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-1074. https://aclanthology.org/N18-1074.
Xiaoyu Tong. Metaphor paraphrasing and word sense disambiguation: Toward a new approach to automated metaphor, 2021. https://scripties.uba.uva.nl/download?fid=681664.
Recent advances in neural metaphor processing: A linguistic, cognitive and social perspective. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4673--4686, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.372. https://aclanthology.org/2021.naacl-main.372.
Correlation-based network analysis combined with machine learning techniques highlight the role of the gaba shunt in brachypodium sylvaticum freezing tolerance. Scientific Reports, 10:no. 4489, 2020. https://doi.org/10.1038/s41598-020-61081-4.
Metaphor detection with cross-lingual model transfer. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 248--258, Baltimore, Maryland, June 2014. Association for Computational Linguistics. doi: 10.3115/v1/P14-1024. https://aclanthology.org/P14-1024.
Alan M. Turing. Computing machinery and intelligence. Mind, LIX(236):433--460, 10 1950. doi: 10.1093/mind/LIX.236.433. https://doi.org/10.1093/mind/LIX.236.433.
Dating documents using graph convolution networks. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1605--1615, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi: 10.18653/v1/P18-1149. https://aclanthology.org/P18-1149.
Temporal reasoning in natural language inference. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4070--4078, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.363. https://aclanthology.org/2020.findings-emnlp.363.
Fill in the BLANC: Human-free quality estimation of document summaries. In Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, pp. 11--20, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.eval4nlp-1.2. https://aclanthology.org/2020.eval4nlp-1.2.
Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30, pp. 5998–--6008. Curran Associates, Inc., 2017. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
The use of spatial relations in referring expression generation. In Proceedings of the Fifth International Natural Language Generation Conference, pp. 59--67, Salt Fork, Ohio, USA, June 2008. Association for Computational Linguistics. https://aclanthology.org/W08-1109.
Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575:350--354, 2019. doi: 10.1038/s41586-019-1724-z. https://doi.org/10.1038/s41586-019-1724-z.
Computational argumentation quality assessment in natural language. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 176--187, Valencia, Spain, April 2017. Association for Computational Linguistics. https://aclanthology.org/E17-1017.
Does GPT-2 know your phone number? Berkeley Artificial Intelligence Research blog, 20 Dec. 2020. https://bair.berkeley.edu/blog/2020/12/20/lmmem/.
GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp. 353--355, Brussels, Belgium, November 2018. Association for Computational Linguistics. doi: 10.18653/v1/W18-5446. https://aclanthology.org/W18-5446.
SuperGLUE: A stickier benchmark for general-purpose language understanding systems. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019a. https://proceedings.neurips.cc/paper/2019/hash/4496bf24afe7fab6f046bf4923da8de6-Abstract.html.
Asking and answering questions to evaluate the factual consistency of summaries. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5008--5020, Online, July 2020a. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.450. https://aclanthology.org/2020.acl-main.450.
GPT-J-6B: A 6 billion parameter autoregressive language model, May 2021. https://github.com/kingoflolz/mesh-transformer-jax.
Continuity of topic, interaction, and query: Learning to quote in online conversations. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6640--6650, Online, November 2020b. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.538. https://aclanthology.org/2020.emnlp-main.538.
Learning language games through interaction. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2368--2378, Berlin, Germany, August 2016. Association for Computational Linguistics. doi: 10.18653/v1/P16-1224. https://aclanthology.org/P16-1224.
It’s going to be okay: Measuring access to support in online communities. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 33--45, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-1004. https://aclanthology.org/D18-1004.
TalkDown: A corpus for condescension detection in context. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3711--3719, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1385. https://aclanthology.org/D19-1385.
Humor detection: A transformer gets the last laugh. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3621--3625, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1372. https://aclanthology.org/D19-1372.
Lexicosyntactic inference in neural models. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4717--4724, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-1501. https://aclanthology.org/D18-1501.
A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1112--1122, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-1101. https://aclanthology.org/N18-1101.
Cognitive and emotional demands of black humour processing: The role of intelligence, aggressiveness and mood. Cognitive Processing, 18:159--167, 2017. doi: https://doi.org/10.1007/s10339-016-0789-y. https://doi.org/10.1007/s10339-016-0789-y.
Language models are few-shot multilingual learners. In Proceedings of the 1st Workshop on Multilingual Representation Learning, pp. 1--15, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.mrl-1.1. https://aclanthology.org/2021.mrl-1.1.
Terry Winograd. Understanding natural language. Cognitive Psychology, 3(1):1--191, 1972. doi: https://doi.org/10.1016/0010-0285(72)90002-3. https://www.sciencedirect.com/science/article/pii/0010028572900023.
Thomas Wolf. Some additional experiments extending the tech report ‘‘assessing BERT’s syntactic abilities’’ by Yoav Goldberg, 2019. https://huggingface.co/bert-syntax/extending-bert-syntax.pdf.
AutoQA: From databases to QA semantic parsers with only synthetic training data. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 422--434, Online, November 2020b. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.31. https://aclanthology.org/2020.emnlp-main.31.
Incorporating latent meanings of morphological compositions to enhance word embeddings. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1232--1242, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi: 10.18653/v1/P18-1114. https://aclanthology.org/P18-1114.
mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 483--498, Online, June 2021b. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.41. https://aclanthology.org/2021.naacl-main.41.
Humor recognition and humor anchor extraction. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2367--2376, Lisbon, Portugal, September 2015. Association for Computational Linguistics. doi: 10.18653/v1/D15-1284. https://aclanthology.org/D15-1284.
Scott Cheng-Hsin Yang and Patrick Shafto. Explainable artificial intelligence via Bayesian teaching. Workshop on Teaching Machines, Robots, and Humans, NIPS 2017, 2017. http://shaftolab.com/assets/papers/yangShafto_NIPS_2017_machine_teaching.pdf.
WikiWalk: Random walks on Wikipedia for semantic relatedness. In Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing (TextGraphs-4), pp. 41--49, Suntec, Singapore, August 2009. Association for Computational Linguistics. https://aclanthology.org/W09-3206.
Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3911--3921, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-1425. https://aclanthology.org/D18-1425.
CoSQL: A conversational text-to-SQL challenge towards cross-domain natural language interfaces to databases. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 1962--1979, Hong Kong, China, November 2019a. Association for Computational Linguistics. doi: 10.18653/v1/D19-1204. https://aclanthology.org/D19-1204.
SParC: Cross-domain semantic parsing in context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4511--4523, Florence, Italy, July 2019b. Association for Computational Linguistics. doi: 10.18653/v1/P19-1443. https://aclanthology.org/P19-1443.
Learning the Dyck language with attention-based Seq2Seq models. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp. 138--146, Florence, Italy, August 2019c. Association for Computational Linguistics. doi: 10.18653/v1/W19-4815. https://aclanthology.org/W19-4815.
Eliezer Yudkowsky. Artificial intelligence as a positive and negative factor in global risk. In Nick Bostrom and Milan M. Ćirković (eds.), Global Catastrophic Risks, pp. 308--345. Oxford University Press, Oxford, 2008. https://web.archive.org/web/20210125025955/https://intelligence.org/files/AIPosNegFactor.pdf.
Figure me out: A gold standard dataset for metaphor interpretation. In Proceedings of the 12th Language Resources and Evaluation Conference, pp. 5810--5819, Marseille, France, May 2020. European Language Resources Association. https://aclanthology.org/2020.lrec-1.712.
WinoWhy: A deep diagnosis of essential commonsense knowledge for answering Winograd schema challenge. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5736--5745, Online, July 2020c. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.508. https://aclanthology.org/2020.acl-main.508.
Reasoning about goals, steps, and temporal ordering with WikiHow. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 4630--4639, Online, November 2020d. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.374. https://aclanthology.org/2020.emnlp-main.374.
Tweet sarcasm detection using deep neural network. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2449--2460, Osaka, Japan, December 2016. The COLING 2016 Organizing Committee. https://aclanthology.org/C16-1231.
Irony detection via sentiment-based transfer learning. Information Processing & Management, 56(5):1633--1644, 2019b. doi: https://doi.org/10.1016/j.ipm.2019.04.006. https://www.sciencedirect.com/science/article/pii/S0306457318307428.
Gender bias in coreference resolution: Evaluation and debiasing methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp. 15--20, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-2003. https://aclanthology.org/N18-2003.
Learning to ask unanswerable questions for machine reading comprehension. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4238--4248, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1415. https://aclanthology.org/P19-1415.
Xiaojin Zhu. Machine teaching: An inverse problem to machine learning and an approach toward optimal education. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29, Menlo Park, CA, Mar. 2015. Association for the Advancement of Artificial Intelligence. https://ojs.aaai.org/index.php/AAAI/article/view/9761.
Alan Zucconi. The secrets of colour interpolation, 6 Jan. 2016. https://www.alanzucconi.com/2016/01/06/colour-interpolation/.