Comparison of Classification for Indonesian Language News Documents Using Recurrent Neural Network (RNN) and Long Short Term Memory (LSTM) Algorithms

Authors

  • Christian Sri Kusuma Aditya Universitas Muhammadiyah Malang
  • Muh Ridha Agam Universitas Muhammadiyah Malang
  • Andhika Rezky Fadillah Universitas Muhammadiyah Malang
  • Briansyah Setio Wiyono Universitas Muhammadiyah Malang

DOI:

https://doi.org/10.36423/index.v6i2.1888

Abstract

The development of online news has grown very fast. The high volume of text documents was triggered by activities from various news sources. Due to the large amount of news that is included on the website, sometimes the news is posted not according to its category which is most likely caused by human error. The grouping of online news is important for user convenience in searching for news according to its category. It need an intelligent system that can classify online news automatically. This research evaluates deep learning techniques using LSTM and RNN, and compared with the results obtained from previous studies, which used the NBC algorithm. To experiment the system, an Indonesia News Corpus with 7 different categories and total 2100 documents, collected by crawling online national news portals, is used. Due to the unbalanced number of class compositions or news categories, integration is also carried out SMOTE. The average empirical results show that the classification accuracy from RNN with SMOTE with an accuracy of 95.2% and followed by LSTM with SMOTE is 97.8%, both of which are able to outperform the NBC method with an accuracy of 73.2%.

References

Prihantoro, E., & Fitriani, D. R. (2015). Modalitas dalam teks berita media online. Prosiding PESAT, 6.

Kencana, W. H., Situmeang, I. V. O., Meisyanti, M., Rahmawati, K. J., & Nugroho, H. (2022). Penggunaan Media Sosial dalam Portal Berita Online. IKRA-ITH HUMANIORA: Jurnal Sosial Dan Humaniora, 6(2), 136-145.

https://news.un.org/en/story/2022/03/1113702

Raphael, M. M. T., Hafizh, M. K., Damasyifa, F. A., Setiawan, S. R., Putra, P. R. B., & Yudistira, N. DETEKSI HOAKS PADA BERITA LOKAL INDONESIA MENGGUNAKAN MODEL BERBASIS RECURRENT NEURAL NETWORK.

Prakoso, B. S., Rosiyadi, D., Utama, H. S., & Aridarma, D. (2019). Klasifikasi Berita Menggunakan Algoritma Naive Bayes Classifer Dengan Seleksi Fitur Dan Boosting. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 3(2), 227-232.

Firmansyah, M. R., Ilyas, R., & Kasyidi, F. (2020, September). Klasifikasi Kalimat Ilmiah Menggunakan Recurrent Neural Network. In Prosiding Industrial Research Workshop and National Seminar (Vol. 11, No. 1, pp. 488-495).

Rais, I. L., & Jondri, J. (2020). Klasifikasi Data Kuesioner dengan Metode Recurrent Neural Network. eProceedings of Engineering, 7(1).

Ivanedra, K., & Mustikasari, M. (2019). Implementasi Metode Recurrent Neural Network Pada Text Summarization Dengan Teknik Abstraktif. J. Teknol. Inf. dan Ilmu Komput, 6(4), 377.

Pakpahan, J. A., Panjaitan, Y. C., Amalia, J., & Pakpahan, M. B. (2022). Model Klasifikasi Berita Palsu Menggunakan Bidirectional LSTM dan Word2vec sebagai Vektorisasi. JATISI (Jurnal Teknik Informatika dan Sistem Informasi), 9(4), 3319-3331.

Aditya, C. S. K., Wicaksono, G. W., & Hilman Abi Sarwan, H. (2023). Sentiment Analysis of the 2024 Presidential Candidates Using SMOTE and Long Short Term Memory. Jurnal Informatika Universitas Pamulang, 8(2), 279-286.

Tannady, S. M. N., Setiabudi, D. H., & Tjondrowiguno, A. N. (2022). Penerapan Long-Short Term Memory dengan Word2Vec Model untuk Mendeteksi Hoax dan Clickbait News pada Berita Online di Indonesia. Jurnal Infra, 10(2), 28-34.

Widhiyasana, Y., Semiawan, T., Mudzakir, I. G. A., & Noor, M. R. (2021). Penerapan Convolutional Long Short-Term Memory untuk Klasifikasi Teks Berita Bahasa Indonesia. Jurnal Nasional Teknik Elektro dan Teknologi Informasi| Vol, 10(4).

Liliana, D. Y., Hikmah, N. N., & Harjono, M. (2021). PENGEMBANGAN SISTEM PEMANTAUAN SENTIMEN BERITA BERBAHASA INDONESIA BERDASARKAN KONTEN DENGAN LONG SHORT-TERM MEMORY. Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), 8(5).

Mualfah, D., Fadila, W., & Firdaus, R. (2022). Teknik SMOTE untuk Mengatasi Imbalance Data pada Deteksi Penyakit Stroke Menggunakan Algoritma Random Forest. Jurnal CoSciTech (Computer Science and Information Technology), 3(2), 107-113.

Douzas, G., & Bacao, F. (2019). Geometric SMOTE a geometrically enhanced drop-in replacement for SMOTE. Information sciences, 501, 118-135.

Denny, M. J., & Spirling, A. (2018). Text preprocessing for unsupervised learning: Why it matters, when it misleads, and what to do about it. Political Analysis, 26(2), 168-189.

Hickman, L., Thapa, S., Tay, L., Cao, M., & Srinivasan, P. (2022). Text preprocessing for text mining in organizational research: Review and recommendations. Organizational Research Methods, 25(1), 114-146.

Chai, C. P. (2023). Comparison of text preprocessing methods. Natural Language Engineering, 29(3), 509-553.

Winarti, T., Kerami, J., & Arief, S. (2017). Determining term on text document clustering using algorithm of enhanced confix stripping stemming. Int. J. Comput. Appl, 157(9), 8-13.

Arifin, A. Z., Mahendra, I. P. A. K., & Ciptaningtyas, H. T. (2009, August). Enhanced confix stripping stemmer and ants algorithm for classifying news document in indonesian language. In The International Conference on Information & Communication Technology and Systems (Vol. 5, pp. 149-158).

A. Mullen, L., Benoit, K., Keyes, O., Selivanov, D., & Arnold, J. (2018). Fast, consistent tokenization of natural language text. Journal of Open Source Software, 3(23), 655.

Vijayarani, S., & Janani, R. (2016). Text mining: open source tokenization tools-an analysis. Advanced Computational Intelligence: An International Journal (ACII), 3(1), 37-47.

Fernández, A., Garcia, S., Herrera, F., & Chawla, N. V. (2018). SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. Journal of artificial intelligence research, 61, 863-905.

Barro, R. A., Sulvianti, I. D., & Afendi, F. M. (2013). Penerapan Synthetic Minority Oversampling Technique (Smote) Terhadap Data Tidak Seimbang Pada Pembuatan Model Komposisi Jamu. Xplore: Journal of Statistics, 1(1).

Cahyo, P Winar., & U,S Aesyi. (2023). Perbandingan LSTM dengan Support Vector Machine dan Multinomial Naive Bayes pada Klasifikasi Kategori Hoax. Jurnal Transformatika. Vol 20(2). pp 23-29.

Ivanedra, Kasyfi., & M, Mustikasari. (2018). Implementasi Metode Recurrent Neural Network pada Text Summarization dengan Teknik Abstraktif. Jurnal Teknologi Informasi dan Ilmu Komputer. Vol 6(4). hal 377-382. 10.25126/jtiik.201961067.

Downloads

Published

2024-11-30