Advanced
Please use this identifier to cite or link to this item: https://digital.lib.ueh.edu.vn/handle/UEH/72835
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorĐặng Ngọc Hoàng Thànhen_US
dc.contributor.authorNguyễn Quỳnh Khánh Hàen_US
dc.contributor.otherNguyễn Quốc Việten_US
dc.contributor.otherNguyễn Nhật Quangen_US
dc.date.accessioned2024-11-19T04:14:54Z-
dc.date.available2024-11-19T04:14:54Z-
dc.date.issued2024-
dc.identifier.urihttps://digital.lib.ueh.edu.vn/handle/UEH/72835-
dc.description.abstractDiscovering customer intents from their written or spoken language plays a vital role in natural language understanding and automated dialogue response. However, labeling intents for new domains from the ground up is a daunting and time-consuming process, often requiring extensive manual effort from domain experts. To address this challenge, this paper proposes an unsupervised approach for discovering intents and automatically producing meaningful intention labels from a collection of unlabeled utterances in the context of a banking domain. In the initial stage, we deploy Deep Embedded Clustering (DEC) to simultaneously learn feature representations and cluster assignments to create a set of coherent clusters where the utterances within each cluster have the same intent. For enhanced performance, we modify the joint loss functions of DEC to preserve the local structure of the model for improved performance (known as Improved Deep Embedded Clustering with Local Structure Preservation). Importantly, we explore the use of a state-of-the-art optimiza tion technique called Sophia Optimizer and employ the Jensen-Shannon divergence as a measure of similarity in the clustering algorithm. We empirically show that our pro posed modification achieves state-of-the-art results in terms of NMI score, surpassing all prior unsupervised DEC architectures. In the second stage, intent labels for each cluster are automatically generated by extract ing the ACTION-OBJECT pair from each utterance using a dependency parser. The pro posed unsupervised approach is capable of automatically generating meaningful intent labels while obtaining high evaluation scores in utterance clustering and intent discov ery. While initially developed to build an intent model for conversational systems, this framework can also be adapted for short text clustering in various general applications.en_US
dc.format.medium64 p.en_US
dc.language.isoenen_US
dc.publisherUniversity of Economics Ho Chi Minh Cityen_US
dc.relation.ispartofseriesGiải thưởng Nhà nghiên cứu trẻ UEH 2024en_US
dc.titleImproving deep embedded clustering for intent mining with jensen- shannon divergence and sophia optimizeren_US
dc.typeResearch Paperen_US
ueh.specialityKhoa học dữ liệu và trí tuệ nhân tạoen_US
ueh.awardGiải Cen_US
item.languageiso639-1en-
item.cerifentitytypePublications-
item.grantfulltextreserved-
item.openairetypeResearch Paper-
item.fulltextFull texts-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
Appears in Collections:Nhà nghiên cứu trẻ UEH
Files in This Item:

File

Description

Size

Format

Show simple item record

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.