Improving deep embedded clustering for intent mining with jensen- shannon divergence and sophia optimizer

Nguyễn Quỳnh Khánh Hà

Please use this identifier to cite or link to this item: https://digital.lib.ueh.edu.vn/handle/UEH/72835

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Đặng Ngọc Hoàng Thành	en_US
dc.contributor.author	Nguyễn Quỳnh Khánh Hà	en_US
dc.contributor.other	Nguyễn Quốc Việt	en_US
dc.contributor.other	Nguyễn Nhật Quang	en_US
dc.date.accessioned	2024-11-19T04:14:54Z	-
dc.date.available	2024-11-19T04:14:54Z	-
dc.date.issued	2024	-
dc.identifier.uri	https://digital.lib.ueh.edu.vn/handle/UEH/72835	-
dc.description.abstract	Discovering customer intents from their written or spoken language plays a vital role in natural language understanding and automated dialogue response. However, labeling intents for new domains from the ground up is a daunting and time-consuming process, often requiring extensive manual effort from domain experts. To address this challenge, this paper proposes an unsupervised approach for discovering intents and automatically producing meaningful intention labels from a collection of unlabeled utterances in the context of a banking domain. In the initial stage, we deploy Deep Embedded Clustering (DEC) to simultaneously learn feature representations and cluster assignments to create a set of coherent clusters where the utterances within each cluster have the same intent. For enhanced performance, we modify the joint loss functions of DEC to preserve the local structure of the model for improved performance (known as Improved Deep Embedded Clustering with Local Structure Preservation). Importantly, we explore the use of a state-of-the-art optimiza tion technique called Sophia Optimizer and employ the Jensen-Shannon divergence as a measure of similarity in the clustering algorithm. We empirically show that our pro posed modification achieves state-of-the-art results in terms of NMI score, surpassing all prior unsupervised DEC architectures. In the second stage, intent labels for each cluster are automatically generated by extract ing the ACTION-OBJECT pair from each utterance using a dependency parser. The pro posed unsupervised approach is capable of automatically generating meaningful intent labels while obtaining high evaluation scores in utterance clustering and intent discov ery. While initially developed to build an intent model for conversational systems, this framework can also be adapted for short text clustering in various general applications.	en_US
dc.format.medium	64 p.	en_US
dc.language.iso	en	en_US
dc.publisher	University of Economics Ho Chi Minh City	en_US
dc.relation.ispartofseries	Giải thưởng Nhà nghiên cứu trẻ UEH 2024	en_US
dc.title	Improving deep embedded clustering for intent mining with jensen- shannon divergence and sophia optimizer	en_US
dc.type	Research Paper	en_US
ueh.speciality	Khoa học dữ liệu và trí tuệ nhân tạo	en_US
ueh.award	Giải C	en_US
item.languageiso639-1	en	-
item.cerifentitytype	Publications	-
item.grantfulltext	reserved	-
item.openairetype	Research Paper	-
item.fulltext	Full texts	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
Appears in Collections:	Nhà nghiên cứu trẻ UEH

Files in This Item:

File

DetaiNCKHSV32206.pdf

Description

Size

5.08 MB

Format

Adobe PDF

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Google Scholar^TM