Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts (2307.02640v1)

Published 5 Jul 2023 in cs.CL

Abstract: The massive collection of user posts across social media platforms is primarily untapped for AI use cases based on the sheer volume and velocity of textual data. Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding. Using a word ranking method, term frequency-inverse document frequency (TF-IDF), to create features across documents, it is possible to perform unsupervised analytics, ML that can group the documents without a human manually labeling the data. For large datasets with thousands of features, t-distributed stochastic neighbor embedding (t-SNE), k-means clustering and Latent Dirichlet allocation (LDA) are employed to learn top words and generate topics for a Reddit and Twitter combined corpus. Using extremely simple deep learning models, this study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery based on a tweet or subreddit post with almost 90% accuracy. Furthermore, the model is capable of achieving higher accuracy on the unsupervised sentiment task than on a rudimentary supervised document classification task. Therefore, unsupervised learning may be considered a viable option in labeling social media documents for NLP tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (9)
  1. 360 degree view of cross-domain opinion classification: a survey. Artificial Intelligence Review, 54(2):1385–1506, August 2020.
  2. Thomas W. Miller. Web and network data science: Modeling techniques in predictive analytics, page 119–170. Pearson Education, 2015.
  3. H. P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1(4):309–317, October 1957.
  4. Karen Spärck Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 60(5):493–502, 2004.
  5. E. Forgy. Cluster analysis of multivariate data: Efficiency versus interpretability of classification. Biometrics, 21(3):768–769, 1965.
  6. James MacQueen et al. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1, pages 281–297. Oakland, CA, USA, 1967.
  7. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993–1022, 2003.
  8. A NLP framework based on meaningful latent-topic detection and sentiment analysis via fuzzy lattice reasoning on youtube comments. Multimedia Tools and Applications, 80(3):4155–4181, September 2020.
  9. Determining the interests of social media users: two approaches. Information Retrieval Journal, 22(1-2):129–158, July 2018.

Summary

We haven't generated a summary for this paper yet.