Erschienen in:

Open Access 2024 | OriginalPaper | Buchkapitel

Automated Topic Analysis with Large Language Models

verfasst von : Andrei Kirilenko, Svetlana Stepchenkova

Erschienen in: Information and Communication Technologies in Tourism 2024

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Patentsuche

Aus

Abstract

Topic modeling is a popular method in tourism data analysis. Many authors have applied various approaches to summarize the main themes of travel blogs, reviews, video diaries, and similar media. One common shortcoming of these methods is their severe limitation in working with short documents, such as blog readers’ feedback (reactions). In the past few years, a new crop of large language models (LLMs), such as ChatGPT, has become available for researchers. We investigate LLM capability in extracting the main themes of viewers’ reactions to popular videos of a rural China destination that explores the cultural, technological, and natural heritage of the countryside. We compare the extracted topics and model accuracy with the results of the traditional Latent Dirichlet Allocation approach. Overall, LLM results are more accurate, specific, and better at separating discussion topics.

1 Introduction

The history of automated annotation of textual documents starts from the 1960s when Borko and Bernick [1] applied exploratory factor analysis to unsupervised classification of scientific publication abstracts. Nowadays, dozens of models have been developed and applied to extract topics from a texts [2, 3]. In tourism and social sciences in general, the most popular approach [4] is Latent Dirichlet Allocation (LDA) developed by Blei [5]. Meanwhile, LDA has important restrictions, which are usually ignored by authors. First, LDA relies on discerning parameters of the document-topic and topic-word distributions, which necessitates the presence of documents of ample length to effectively encapsulate a diversified amalgamation of topics. Second, the LDA algorithm mandates a substantial corpus of textual data to ensure precise estimation of the underlying topic distributions. Lastly, the discordant or extraneous documents within the corpus, which are common in social media, negatively impact the quality of the inferred topics. Even when all these assumptions are met, LDA topic models are criticized for inherent instability and challenges in defining the “optimal” number of target topics.

In the past few years, a new crop of large language model (LLM) such as Google’s BERT [6] has become increasingly popular, owing the success to their ability to capture the context instead of considering document words in isolation. In tourism domain, TourBERT topic model was pre-trained on tourist reviews, descriptions of tourist services, attractions and sights [7], though we are not aware of any publication in tourism journals that would utilize it.

The explosive development of the LLM field, which drew public attention after a ChatGPT became freely available over a web-based interface, has led to exploration of LLM topic extraction capabilities following a set of instructions (prompts). A new discipline known as prompt engineering explores LLM ability to learn new tasks from examples provided as an input (prompts). The key concepts of prompt engineering are the precise setting of the context such as providing relevant facts; providing elaborate instructions; conditioning LLM behavior by, e.g., providing examples; controlling for data biases; iterative refinement of LLM responses; and, finally, result validating [8, 9].

Emerging studies hint at ability of using LLM prompt engineering for topic modeling [10‐12]. In this respect, LLMs have numerous advantages over previous generation of topic models such as leveraging general knowledge obtained in the pre-training process to infer the comments’ topics, even when the data is incomplete or ambiguous; ability to infer the topic of short comments by transferring knowledge from similar domains; and robustness to noise in the data. They can handle misspellings, grammatical errors, and inconsistent punctuation, which are common in noisy documents, by capitalizing on the surrounding context and their understanding of language patterns [8, 9].

This paper is the first to the best of our knowledge attempt to apply an LLM (GPT-3) to extraction of topics from a set of online feedbacks (reactions) of blog readers. A typical reaction is short (one sentence) and noisy (contains cultural references, slang, and typos), which makes topic extractions with traditional methods challenging. We compare extracted topics with results of traditional LDA model trained on the same dataset.

2 Data and Methodology

The specific setting are online reviews of a famous Chinese social media influencer Li Ziqi who holds a Guinness World Record for the “most subscribers for a Chinese language channel on YouTube”. The focus of Li Ziqi’s videos is on rural China; their depiction of simple yet beautiful traditional way of life evidently impacts potential tourists wishing to “visit LIZIQI’S world”. We collected all Weibo and Youtube reactions to four most popular Li Ziqi’s videos reflective of her area of interest: Rural way of life; Traditional self-made culture; Food and cooking; and Input of China to the world civilization. The collected data was cleaned, and short reactions (lesser than three words) were removed. In total, 1,852 reactions in English language were collected on Youtube. On Weibo, 2,980 reactions in simplified Chinese were collected and translated to English with Google translate. The quality of translation was verified by a native speaker.

Collected data was then processed in batches of circa 2,000 words to fit GPT-3 limits using the following prompt: “Find the most common and prominent topics covered in the {text}. For each topic that you find print the number of occurrences of this topic.” Here, {text} represents a block of reactions. Identified topics were then merged using GPT-3, resulting in 18 major topics. Finally, the reactions were mapped back to the topics following prompt engineering best practices (abridged):

goal = “match review to the best fitting review topic from a list of topics”
steps = “1. Break the list of reviews onto separate reviews; 2. For each review find two best matching review topics from the list of review topics separated by the ‘;’ sign; 3. When there are no well-matching topics, assume that the topic is ‘Other’; 4. Print the review followed by the best matching topics”
actAs = “a classifier assigning a class label to a data input”
format = “a table with reviews in the first column …”
prompt = “Your goal is to {goal}, acting as {actAs}. To achieve this, take a systematic approach by: {steps}. Present your response in markdown format, following the structure: {format}. The list of review topics are as follows: {topics_str}”.
The list of reviews is as follows: {text}

For comparison, we used identical set of reactions to extract their topic with LDA. Data was pre-processed following the best practices of topic modeling: stop word removal, bigram tokenization, and lemmatization. Then, LDA topic modeling was completed for the number of topics varying from 5 to 25. A 13-topic solution was selected for its best interpretability.

3 Results

Table 1 presents LLM topics, together with validation outcomes. The quality of topic modeling was validated by bilingual expert on a stratified random sample consisting of 360 reactions (20 per topic). The overall accuracy of topic modeling, as conducted by LLM, was found to be 97.7%. The most important reason for the high accuracy is improved recognition of short texts. Note that 30% of reviews were classified into “Other” category and were not rated. In a similar way, we performed validation of LDA topics (Table 2). For each document, LDA returns a mix of topics; we validated the topic with the highest probability and only this probability exceeded 0.5. One can interpret this decision as assigning documents not related to any high probability topic to the category “Other” (42% of dataset) and removing them from validation process. Overall accuracy of topic assignment was 58%.

Table 1.

Topic validation outcomes, LLM.

Topics	Weibo	YT	Overall
Admiration & praise for Li Ziqi	86%	100%	93%
Curiosity about Li Ziqi background	100%	100%	100%
Desire to learn from Li Ziqi & replicate her creations	92%	91%	92%
Enthusiasm and support as a fan	100%	100%	100%
Li Ziqi beauty & resemblance to a princess	92%	88%	90%
Li Ziqi’s genuineness, sincerity, & trustworthiness	100%	100%	100%
Li Ziqi’s impact on viewers	100%	100%	100%
Li Ziqi’s role model status	100%	100%	100%
Animals (specifically dogs & sheep)	100%	100%	100%
Beauty and aesthetics of traditional life & products	92%	100%	96%
Desire to live a peaceful, natural, simple, self-sufficient life	100%	100%	100%
Nature & rural life	100%	100%	100%
Nostalgia & childhood memories	100%	100%	100%
Li Ziqi’s connection with her grandmother	100%	92%	96%
Chinese traditional crafts & skills	100%	92%	96%
Chinese traditional culture & heritage	100%	91%	96%
Art of calligraphy	100%	100%	100%
Food & cooking	100%	100%	100%

Table 2.

Topic validation accuracy, LDA.

Topic words	Topic name	Acc.
chinese; little; culture; admire; chinese culture; need; inherit; ability; music; inherit chinese	Chinese traditional culture & heritage	35%
life; live; place; wish; thank; beauty; nature; love; perfect; start	Beauty of living with nature	55%
love; cute; like; feel; sheep; lamb; follow; powerful; puppy; skill	Cute dogs & sheep	40%
girl; amazing; treasure; china; miss; think; home; sister; life; make	L. is amazing, treasure	50%
know; sister; happy; want; marry; fairy; snack; qiqi; good; want know	L. is fairy like, I want to marry her	65%
work; great; hard; lady; young; quot; malaysia; hard work; young lady; share	L is hard working	40%
want; house; make; time; fruit; grow; live; tree; build; candied	Interest in grounds, visiting, marriage	60%
look; paper; make; traditional; popcorn; chinese; brush; super; inkstone; wonderful	Traditional culture, craft, and cooking	75%
woman; make; awesome; world; best; mother; real; cook; amazing; feel	Admiration & praise for L	65%
good; thing; amaze; person; good good; heart; life; hungry; make; mickey	Expressions of enthusiasm	65%
beautiful; talented; strong; woman; make; people; wool; amazing; ancient; process	L. is beautiful, talented, and strong	75%
bamboo; time; grandma; make; hand; long; wear; glove; child; sofa	Traditional crafts, wear gloves!	55%
come; fairy; people; kind; update; kind fairy; mango; dislike; help; night	General support from fans	75%

4 Discussion

Given that the social media reactions tend to be short, it is not surprising that LDA topic modeling accuracy was moderate (58%); in comparison, LLM accuracy was excellent (98%). Meanwhile, even though LDA performance in terms of assigning the documents to specific topics was unimpressive, the overall set of topics is similar between LDA and LLM. It includes themes related to Chinese culture, crafts, beauty of living with nature, pets, and variations of expressions of praise towards the influencer. Note that LLM derived topics are much more specific, easy to comprehend, and did not require tedious interpretation process.

To our best knowledge, this is the first attempt to use LLM in tourism domain, a much wider effort is needed to make solid conclusions about the best practices and limitations of the methodology. The field of prompt engineering has existed for only one year. However, in our view application of LLM to topic modeling in tourism domain seems to have a very high potential. Our next plans are exploration of LLM capabilities in analysis of textual and pictorial tourism data with goals of understanding limitations and formulation of the best practices.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Springer Professional