JURNAL INTERNASIONAL
Journal of Theoritical and Apllied Infomation Thechnology
Advances in digital technology and the World Wide Weh has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awarenses for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meangingful gruops. Clustering is an importtant task in data mining and maching learning. The accuracy of clustering depends tightly on the selection of the text representation menthod. Traditional methods of text representation model ducuments as bags of words using trem-frequency index ducument frequency (TFIDF). this method ignores the relationship and meanings of word is document. as a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the poblem of sparsity and semantic is reduced by proposing a graph based text representation method,namely dependecy grap with the aim of improving the accuracy of document clustering. The dependency graph repressentation scheme is created through an accumalation of syntactic nd semantic analysis. A sample of 20 news groups,dataset was used in this study. The findings proved that the proposed text representation method leads to more accurate document clustering results.
| JI03220034 | 004.0285 JATIT J JurnalInternasional | Perpus STMIK (Jurnal Internasional) | Tersedia |
Tidak tersedia versi lain