Skip to main content
Erschienen in:
Buchtitelbild

Open Access 2024 | OriginalPaper | Buchkapitel

Automated Topic Analysis with Large Language Models

verfasst von : Andrei Kirilenko, Svetlana Stepchenkova

Erschienen in: Information and Communication Technologies in Tourism 2024

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
download
DOWNLOAD
print
DRUCKEN
insite
SUCHEN
loading …

Abstract

Topic modeling is a popular method in tourism data analysis. Many authors have applied various approaches to summarize the main themes of travel blogs, reviews, video diaries, and similar media. One common shortcoming of these methods is their severe limitation in working with short documents, such as blog readers’ feedback (reactions). In the past few years, a new crop of large language models (LLMs), such as ChatGPT, has become available for researchers. We investigate LLM capability in extracting the main themes of viewers’ reactions to popular videos of a rural China destination that explores the cultural, technological, and natural heritage of the countryside. We compare the extracted topics and model accuracy with the results of the traditional Latent Dirichlet Allocation approach. Overall, LLM results are more accurate, specific, and better at separating discussion topics.

1 Introduction

The history of automated annotation of textual documents starts from the 1960s when Borko and Bernick [1] applied exploratory factor analysis to unsupervised classification of scientific publication abstracts. Nowadays, dozens of models have been developed and applied to extract topics from a texts [2, 3]. In tourism and social sciences in general, the most popular approach [4] is Latent Dirichlet Allocation (LDA) developed by Blei [5]. Meanwhile, LDA has important restrictions, which are usually ignored by authors. First, LDA relies on discerning parameters of the document-topic and topic-word distributions, which necessitates the presence of documents of ample length to effectively encapsulate a diversified amalgamation of topics. Second, the LDA algorithm mandates a substantial corpus of textual data to ensure precise estimation of the underlying topic distributions. Lastly, the discordant or extraneous documents within the corpus, which are common in social media, negatively impact the quality of the inferred topics. Even when all these assumptions are met, LDA topic models are criticized for inherent instability and challenges in defining the “optimal” number of target topics.
In the past few years, a new crop of large language model (LLM) such as Google’s BERT [6] has become increasingly popular, owing the success to their ability to capture the context instead of considering document words in isolation. In tourism domain, TourBERT topic model was pre-trained on tourist reviews, descriptions of tourist services, attractions and sights [7], though we are not aware of any publication in tourism journals that would utilize it.
The explosive development of the LLM field, which drew public attention after a ChatGPT became freely available over a web-based interface, has led to exploration of LLM topic extraction capabilities following a set of instructions (prompts). A new discipline known as prompt engineering explores LLM ability to learn new tasks from examples provided as an input (prompts). The key concepts of prompt engineering are the precise setting of the context such as providing relevant facts; providing elaborate instructions; conditioning LLM behavior by, e.g., providing examples; controlling for data biases; iterative refinement of LLM responses; and, finally, result validating [8, 9].
Emerging studies hint at ability of using LLM prompt engineering for topic modeling [1012]. In this respect, LLMs have numerous advantages over previous generation of topic models such as leveraging general knowledge obtained in the pre-training process to infer the comments’ topics, even when the data is incomplete or ambiguous; ability to infer the topic of short comments by transferring knowledge from similar domains; and robustness to noise in the data. They can handle misspellings, grammatical errors, and inconsistent punctuation, which are common in noisy documents, by capitalizing on the surrounding context and their understanding of language patterns [8, 9].
This paper is the first to the best of our knowledge attempt to apply an LLM (GPT-3) to extraction of topics from a set of online feedbacks (reactions) of blog readers. A typical reaction is short (one sentence) and noisy (contains cultural references, slang, and typos), which makes topic extractions with traditional methods challenging. We compare extracted topics with results of traditional LDA model trained on the same dataset.

2 Data and Methodology

The specific setting are online reviews of a famous Chinese social media influencer Li Ziqi who holds a Guinness World Record for the “most subscribers for a Chinese language channel on YouTube”. The focus of Li Ziqi’s videos is on rural China; their depiction of simple yet beautiful traditional way of life evidently impacts potential tourists wishing to “visit LIZIQI’S world”. We collected all Weibo and Youtube reactions to four most popular Li Ziqi’s videos reflective of her area of interest: Rural way of life; Traditional self-made culture; Food and cooking; and Input of China to the world civilization. The collected data was cleaned, and short reactions (lesser than three words) were removed. In total, 1,852 reactions in English language were collected on Youtube. On Weibo, 2,980 reactions in simplified Chinese were collected and translated to English with Google translate. The quality of translation was verified by a native speaker.
Collected data was then processed in batches of circa 2,000 words to fit GPT-3 limits using the following prompt: “Find the most common and prominent topics covered in the {text}. For each topic that you find print the number of occurrences of this topic.” Here, {text} represents a block of reactions. Identified topics were then merged using GPT-3, resulting in 18 major topics. Finally, the reactions were mapped back to the topics following prompt engineering best practices (abridged):
  • goal = “match review to the best fitting review topic from a list of topics”
  • steps = “1. Break the list of reviews onto separate reviews; 2. For each review find two best matching review topics from the list of review topics separated by the ‘;’ sign; 3. When there are no well-matching topics, assume that the topic is ‘Other’; 4. Print the review followed by the best matching topics”
  • actAs = “a classifier assigning a class label to a data input”
  • format = “a table with reviews in the first column …”
  • prompt = “Your goal is to {goal}, acting as {actAs}. To achieve this, take a systematic approach by: {steps}. Present your response in markdown format, following the structure: {format}. The list of review topics are as follows: {topics_str}”.
  • The list of reviews is as follows: {text}
For comparison, we used identical set of reactions to extract their topic with LDA. Data was pre-processed following the best practices of topic modeling: stop word removal, bigram tokenization, and lemmatization. Then, LDA topic modeling was completed for the number of topics varying from 5 to 25. A 13-topic solution was selected for its best interpretability.

3 Results

Table 1 presents LLM topics, together with validation outcomes. The quality of topic modeling was validated by bilingual expert on a stratified random sample consisting of 360 reactions (20 per topic). The overall accuracy of topic modeling, as conducted by LLM, was found to be 97.7%. The most important reason for the high accuracy is improved recognition of short texts. Note that 30% of reviews were classified into “Other” category and were not rated. In a similar way, we performed validation of LDA topics (Table 2). For each document, LDA returns a mix of topics; we validated the topic with the highest probability and only this probability exceeded 0.5. One can interpret this decision as assigning documents not related to any high probability topic to the category “Other” (42% of dataset) and removing them from validation process. Overall accuracy of topic assignment was 58%.
Table 1.
Topic validation outcomes, LLM.
Topics
Weibo
YT
Overall
Admiration & praise for Li Ziqi
86%
100%
93%
Curiosity about Li Ziqi background
100%
100%
100%
Desire to learn from Li Ziqi & replicate her creations
92%
91%
92%
Enthusiasm and support as a fan
100%
100%
100%
Li Ziqi beauty & resemblance to a princess
92%
88%
90%
Li Ziqi’s genuineness, sincerity, & trustworthiness
100%
100%
100%
Li Ziqi’s impact on viewers
100%
100%
100%
Li Ziqi’s role model status
100%
100%
100%
Animals (specifically dogs & sheep)
100%
100%
100%
Beauty and aesthetics of traditional life & products
92%
100%
96%
Desire to live a peaceful, natural, simple, self-sufficient life
100%
100%
100%
Nature & rural life
100%
100%
100%
Nostalgia & childhood memories
100%
100%
100%
Li Ziqi’s connection with her grandmother
100%
92%
96%
Chinese traditional crafts & skills
100%
92%
96%
Chinese traditional culture & heritage
100%
91%
96%
Art of calligraphy
100%
100%
100%
Food & cooking
100%
100%
100%
Table 2.
Topic validation accuracy, LDA.
Topic words
Topic name
Acc.
chinese; little; culture; admire; chinese culture; need; inherit; ability; music; inherit chinese
Chinese traditional culture & heritage
35%
life; live; place; wish; thank; beauty; nature; love; perfect; start
Beauty of living with nature
55%
love; cute; like; feel; sheep; lamb; follow; powerful; puppy; skill
Cute dogs & sheep
40%
girl; amazing; treasure; china; miss; think; home; sister; life; make
L. is amazing, treasure
50%
know; sister; happy; want; marry; fairy; snack; qiqi; good; want know
L. is fairy like, I want to marry her
65%
work; great; hard; lady; young; quot; malaysia; hard work; young lady; share
L is hard working
40%
want; house; make; time; fruit; grow; live; tree; build; candied
Interest in grounds, visiting, marriage
60%
look; paper; make; traditional; popcorn; chinese; brush; super; inkstone; wonderful
Traditional culture, craft, and cooking
75%
woman; make; awesome; world; best; mother; real; cook; amazing; feel
Admiration & praise for L
65%
good; thing; amaze; person; good good; heart; life; hungry; make; mickey
Expressions of enthusiasm
65%
beautiful; talented; strong; woman; make; people; wool; amazing; ancient; process
L. is beautiful, talented, and strong
75%
bamboo; time; grandma; make; hand; long; wear; glove; child; sofa
Traditional crafts, wear gloves!
55%
come; fairy; people; kind; update; kind fairy; mango; dislike; help; night
General support from fans
75%

4 Discussion

Given that the social media reactions tend to be short, it is not surprising that LDA topic modeling accuracy was moderate (58%); in comparison, LLM accuracy was excellent (98%). Meanwhile, even though LDA performance in terms of assigning the documents to specific topics was unimpressive, the overall set of topics is similar between LDA and LLM. It includes themes related to Chinese culture, crafts, beauty of living with nature, pets, and variations of expressions of praise towards the influencer. Note that LLM derived topics are much more specific, easy to comprehend, and did not require tedious interpretation process.
To our best knowledge, this is the first attempt to use LLM in tourism domain, a much wider effort is needed to make solid conclusions about the best practices and limitations of the methodology. The field of prompt engineering has existed for only one year. However, in our view application of LLM to topic modeling in tourism domain seems to have a very high potential. Our next plans are exploration of LLM capabilities in analysis of textual and pictorial tourism data with goals of understanding limitations and formulation of the best practices.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://​creativecommons.​org/​licenses/​by/​4.​0/​), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Literatur
1.
Zurück zum Zitat Borko, H., Bernick, M.: Automatic document classification. J. ACM JACM 10, 151–162 (1963)CrossRef Borko, H., Bernick, M.: Automatic document classification. J. ACM JACM 10, 151–162 (1963)CrossRef
2.
Zurück zum Zitat Churchill, R., Singh, L.: The evolution of topic modeling. ACM Comput. Surv. 54, 1–35 (2022)CrossRef Churchill, R., Singh, L.: The evolution of topic modeling. ACM Comput. Surv. 54, 1–35 (2022)CrossRef
3.
Zurück zum Zitat Vayansky, I., Kumar, S.A.: A review of topic modeling methods. Inf. Syst. 94, 101582 (2020)CrossRef Vayansky, I., Kumar, S.A.: A review of topic modeling methods. Inf. Syst. 94, 101582 (2020)CrossRef
4.
Zurück zum Zitat Egger, R., Yu, J.: A topic modeling comparison between LDA, NMF, Top2Vec, and BERTopic to demystify twitter posts. Front. Sociol. 7, 886498 (2022)CrossRef Egger, R., Yu, J.: A topic modeling comparison between LDA, NMF, Top2Vec, and BERTopic to demystify twitter posts. Front. Sociol. 7, 886498 (2022)CrossRef
5.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003) Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
6.
Zurück zum Zitat Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. ArXiv Prepr. arXiv:1810.04805 (2018) Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. ArXiv Prepr. arXiv:​1810.​04805 (2018)
7.
8.
Zurück zum Zitat Ekin, S.: Prompt Engineering for ChatGPT: A Quick Guide to Techniques, Tips, and Best Practices (2023) Ekin, S.: Prompt Engineering for ChatGPT: A Quick Guide to Techniques, Tips, and Best Practices (2023)
9.
10.
11.
Zurück zum Zitat Kublik, S., Saboo, S.: GPT-3. O’Reilly Media, Incorporated, Sebastopol (2022) Kublik, S., Saboo, S.: GPT-3. O’Reilly Media, Incorporated, Sebastopol (2022)
12.
Zurück zum Zitat Rijcken, E., Scheepers, F., Zervanou, K., Spruit, M., Mosteiro, P., Kaymak, U.: Towards interpreting topic models with ChatGPT. Presented at the The 20th World Congress of the International Fuzzy Systems Association (2023) Rijcken, E., Scheepers, F., Zervanou, K., Spruit, M., Mosteiro, P., Kaymak, U.: Towards interpreting topic models with ChatGPT. Presented at the The 20th World Congress of the International Fuzzy Systems Association (2023)
Metadaten
Titel
Automated Topic Analysis with Large Language Models
verfasst von
Andrei Kirilenko
Svetlana Stepchenkova
Copyright-Jahr
2024
DOI
https://doi.org/10.1007/978-3-031-58839-6_3

Premium Partner