A Comparative Study of Sentiment Analysis Methods for Detecting Fake Reviews in E-Commerce

Maneerat Puttarattanamanee, Laor Boongasame, Karanrat Thammarak

Abstract


The popularity of the e-commerce system has increased, especially under the COVID scenario. Consumer product reviews from the past have had a significant impact on influencing consumers' purchasing decisions. Fake reviews—those written by humans and computers that engage in dishonest behavior—are consequently generated to increase product sales. The fake reviews hurt consumers and are dishonest. The goal of this research is to examine and evaluate the performance of various methods for identifying fake reviews. The well-known and widely-used Amazon Review Data (2018) dataset was used for this research. The first 10 product categories on Amazon.com with favorable feedback will be provided in the data section. After that, perform fundamental data preparation procedures such as special character trimming, bag of words, TF-IDF, etc. The models are trained to create a dataset for detecting fake reviews. This research compares the performance of four different models: GPT-2, NBSVM, BiLSTM, and RoBERTa. The hyperparameters of the models are also tuned to find the optimal values. The research concludes that the RoBERTa model performs the best overall, with an accuracy of 97%. GPT-2 has an overall accuracy of 82%, NBSVM has an overall accuracy of 95%, and BiLSTM has an overall accuracy of 92%. The research also calculates the Area Under the Curve (AUC) for each model and finds that RoBERTa has an AUC of 0.9976, NBSVM has an AUC of 0.9888, BiLSTM has an AUC of 0.9753, and GPT-2 has an AUC of 0.9226. It can be observed that the RoBERTa model has the highest AUC value, which is close to 1. Therefore, it can be concluded that this model provides the most accurate prediction for detecting fake reviews, which is the main focus of this research.

 

Doi: 10.28991/HIJ-2023-04-02-08

Full Text: PDF


Keywords


Fake Reviews Detection; GPT-2; NBSVM;, BiLSTM; RoBERTa.

References


Santos, K. E. S. (2020). Online Marketing: Benefits and Difficulties to online Business Sellers. International Journal of Advanced Engineering Research and Science, 7(3), 159–163. doi:10.22161/ijaers.73.27.

Zhu, L., Li, H., Wang, F. K., He, W., & Tian, Z. (2020). How online reviews affect purchase intention: a new model based on the stimulus-organism-response (S-O-R) framework. Aslib Journal of Information Management, 72(4), 463–488. doi:10.1108/AJIM-11-2019-0308.

Mehmeti-Bajrami, S., Qerimi, F., & Qerimi, A. (2022). The Impact of Digital Marketing vs. Traditional Marketing on Consumer Buying Behavior. HighTech and Innovation Journal, 3(3), 326-340. doi:10.28991/HIJ-2022-03-03-08.

Jain, P. K., Pamula, R., & Srivastava, G. (2021). A systematic literature review on machine learning applications for consumer sentiment analysis using online reviews. Computer Science Review, 41, 100413. doi:10.1016/j.cosrev.2021.100413.

Hu, S., Kumar, A., Al-Turjman, F., Gupta, S., Seth, S., & Shubham. (2020). Reviewer Credibility and Sentiment Analysis Based User Profile Modelling for Online Product Recommendation. IEEE Access, 8, 26172–26189. doi:10.1109/ACCESS.2020.2971087.

Van Vu, D., Tran, G. N., & Van Nguyen, C. (2022). Digital Transformation, Student Satisfaction, Word of Mouth and Online Learning Intention in Vietnam. Emerging Science Journal, 6, 40-54. doi:10.28991/ESJ-2022-SIED-04.

Mohawesh, R., Xu, S., Tran, S. N., Ollington, R., Springer, M., Jararweh, Y., & Maqsood, S. (2021). Fake Reviews Detection: A Survey. IEEE Access, 9, 65771–65802. doi:10.1109/ACCESS.2021.3075573.

Alghamdi, J., Lin, Y., & Luo, S. (2022). A Comparative Study of Machine Learning and Deep Learning Techniques for Fake News Detection. Information (Switzerland), 13(12), 576. doi:10.3390/info13120576.

Haque, I., Alim, M., Alam, M., Nawshin, S., Noori, S. R. H., & Habib, M. T. (2022). Analysis of recognition performance of plant leaf diseases based on machine vision techniques. Journal of Human, Earth, and Future, 3(1), 129-137. doi:10.28991/HEF-2022-03-01-09.

Nandwani, P., & Verma, R. (2021). A review on sentiment analysis and emotion detection from text. Social Network Analysis and Mining, 11(1), 81. doi:10.1007/s13278-021-00776-6.

Dadhich, A., & Thankachan, B. (2021). Sentiment Analysis of Amazon Product Reviews Using Hybrid Rule-based Approach. International Journal of Engineering and Manufacturing, 11(2), 40–52. doi:10.5815/ijem.2021.02.04.

Lajkó, M., Csuvik, V., & Vidács, L. (2022). Towards JavaScript program repair with generative pre-trained transformer (GPT-2). Proceedings of the Third International Workshop on Automated Program Repair. doi:10.1145/3524459.3527350.

Golpour, P., Ghayour-Mobarhan, M., Saki, A., Esmaily, H., Taghipour, A., Tajfard, M., Ghazizadeh, H., Moohebati, M., & Ferns, G. A. (2020). Comparison of support vector machine, naïve bayes and logistic regression for assessing the necessity for coronary angiography. International Journal of Environmental Research and Public Health, 17(18), 1–9. doi:10.3390/ijerph17186449.

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). Roberta: A robustly optimized Bert pretraining approach. arXiv preprint arXiv:1907.11692. doi:10.48550/arXiv.1907.11692.

Ghosh, L., Saha, S., & Konar, A. (2020). Bi-directional Long Short-Term Memory model to analyze psychological effects on gamers. Applied Soft Computing Journal, 95, 106573. doi:10.1016/j.asoc.2020.106573.

Mahyoob, M., Algaraady, J., & Alrahaili, M. (2020). Linguistic-Based Detection of Fake News in Social Media. International Journal of English Linguistics, 11(1), 99. doi:10.5539/ijel.v11n1p99.

Koli, R., & Redekar, S. (2023). A Review on Sentiment Analysis Methodologies, Practices and Applications with Machine Learning. International Journal of Computer Science and Mobile Computing, 12(6), 64–70. doi:10.47760/ijcsmc.2023.v12i06.007.

Kotelnikova, A. V. (2020). Comparison of Deep Learning and Rule-based Method for the Sentiment Analysis Task. 2020 International Multi-Conference on Industrial Engineering and Modern Technologies (FarEastCon), Vladivostok, Russia. doi:10.1109/fareastcon50210.2020.9271333.

Asghar, M. Z., Khan, A., Ahmad, S., Qasim, M., & Khan, I. A. (2017). Lexicon-enhanced sentiment analysis framework using rule-based classification scheme. PLoS ONE, 12(2), 171649. doi:10.1371/journal.pone.0171649.

Qader, W. A., Ameen, M. M., & Ahmed, B. I. (2019). An Overview of Bag of Words; Importance, Implementation, Applications, and Challenges. 2019 International Engineering Conference (IEC). doi:10.1109/iec47844.2019.8950616.

Cahyani, D. E., & Patasik, I. (2021). Performance comparison of TF-IDF and word2vec models for emotion text classification. Bulletin of Electrical Engineering and Informatics, 10(5), 2780–2788. doi:10.11591/eei.v10i5.3157.

Hamayel, M. J., & Owda, A. Y. (2021). A Novel Cryptocurrency Price Prediction Model Using GRU, LSTM and bi-LSTM Machine Learning Algorithms. AI (Switzerland), 2(4), 477–496. doi:10.3390/ai2040030.

Alammar, J. (2018). The Illustrated GPT-2 (Visualizing Transformer Language Models). Available online: http://jalammar.github.io/illustrated-gpt2/ (accessed on April 2023).

Acheampong, F. A., Nunoo-Mensah, H., & Chen, W. (2021). Transformer models for text-based emotion detection: a review of BERT-based approaches. Artificial Intelligence Review, 54(8), 5789–5829. doi:10.1007/s10462-021-09958-2.

Vidanagama, D. U., Silva, A. T. P., & Karunananda, A. S. (2022). Ontology based sentiment analysis for fake review detection. Expert Systems with Applications, 206, 117869. doi:10.1016/j.eswa.2022.117869.

Qayyum, H., Ali, F., Nawaz, M., & Nazir, T. (2023). FRD-LSTM: a novel technique for fake reviews detection using DCWR with the Bi-LSTM method. Multimedia Tools and Applications, 82(20), 31505–31519. doi:10.1007/s11042-023-15098-2.

Salminen, J., Kandpal, C., Kamel, A. M., Jung, S., & Jansen, B. J. (2022). Creating and detecting fake reviews of online products. Journal of Retailing and Consumer Services, 64, 102771. doi:10.1016/j.jretconser.2021.102771.

Yao, J., Zheng, Y., & Jiang, H. (2021). An Ensemble Model for Fake Online Review Detection Based on Data Resampling, Feature Pruning, and Parameter Optimization. IEEE Access, 9, 16914–16927. doi:10.1109/ACCESS.2021.3051174.

Rathore, P., Soni, J., Prabakar, N., Palaniswami, M., & Santi, P. (2021). Identifying Groups of Fake Reviewers Using a Semisupervised Approach. IEEE Transactions on Computational Social Systems, 8(6), 1369–1378. doi:10.1109/TCSS.2021.3085406.

Tufail, H., Ashraf, M. U., Alsubhi, K., & Aljahdali, H. M. (2022). The Effect of Fake Reviews on e-Commerce during and after Covid-19 Pandemic: SKL-Based Fake Reviews Detection. IEEE Access, 10, 25555–25564. doi:10.1109/ACCESS.2022.3152806.

Mir, A. Q., Khan, F. Y., & Chishti, M. A. (2023). Online Fake Review Detection Using Supervised Machine Learning And Bert Model. arXiv preprint arXiv:2301.03225. doi:10.48550/arXiv.2301.03225.

S, M. (2023). Detection of Fake Product Review using Machine Learning Techniques. International Journal for Research in Applied Science and Engineering Technology, 11(5), 1332–1338. doi:10.22214/ijraset.2023.51761.

Alsubari, S. N., Deshmukh, S. N., Al-Adhaileh, M. H., Alsaade, F. W., & Aldhyani, T. H. H. (2021). Development of Integrated Neural Network Model for Identification of Fake Reviews in E-Commerce Using Multidomain Datasets. Applied Bionics and Biomechanics, 2021, 1–11. doi:10.1155/2021/5522574.

Shetgaonkar, P., Rodrigues, J. T., Aswale, S., Gonsalves, V. L. K., Rodrigues, J. C., & Naik, A. (2021). Fake Review Detection Using Sentiment Analysis and Deep Learning. 2021 International Conference on Technological Advancements and Innovations (ICTAI), Tashkent, Uzbekistan. doi:10.1109/ictai53825.2021.9673375.

Elmogy, A. M., Tariq, U., Mohammed, A., & Ibrahim, A. (2021). Fake Reviews Detection using Supervised Machine Learning. International Journal of Advanced Computer Science and Applications, 12(1), 601-606. doi:10.14569/ijacsa.2021.0120169.

Nagi Alsubari, S., N. Deshmukh, S., Abdullah Alqarni, A., Alsharif, N., H. H. Aldhyani, T., Waselallah Alsaade, F., & I. Khalaf, O. (2022). Data Analytics for the Identification of Fake Reviews Using Supervised Learning. Computers, Materials & Continua, 70(2), 3189–3204. doi:10.32604/cmc.2022.019625.

Ni, J., Li, & McAuley. (2018). Empirical Methods in Natural Language Processing (EMNLP). Amazon Review Data. Available online: https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2 (accessed on March 2023).


Full Text: PDF

DOI: 10.28991/HIJ-2023-04-02-08

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Maneerat Puttarattanamanee, Laor Boongasame, Karanrat Thammarak