Research on Machine Learning Techniques for Unmasking Online Negative Speech

Kazi Asif Ferdous

doi:10.54097/5xbmas96

Authors

Kazi Asif Ferdous

DOI:

https://doi.org/10.54097/5xbmas96

Keywords:

Negative speech detection, Banglish, Natural Language Processing, Online safety; PA; QDA, Machine Learning; Algorithms, Ensemble Model; Dropout; Embedding

Abstract

The proliferation of online platforms has facilitated global communication but has also led to a concerning rise in hate speech across digital spaces. This research addresses the critical challenge of detecting hate speech in Banglish, a blend of Bangla and English, which presents unique linguistic and cultural complexities. We propose a comprehensive approach leveraging machine learning techniques to identify and mitigate hate speech in this bilingual context. Our methodology involves data collection, prepossessing, feature extraction, and the evaluation of multiple machine learning models, including SVM, KNN, Naive Bayes, Decision Trees, Logistic Regression, Random Forest, AdaBoost, Bagging, Extra Trees, Gradient Boosting, XGBoost, and QDA. The dataset, meticulously curated to capture Banglish nuances, was balanced using Random Over Sampler to address class imbalance. Among the evaluated models, Naive Bayes emerged as the top performer, achieving an accuracy of 78.33%, precision of 81.95%, recall of 74.40%, F1 score of 77.88%, specificity of 82.51%, and an AUC-ROC score of 86.18%. The study highlights the effectiveness of traditional machine learning models in handling high-dimensional sparse data and provides a foundation for developing robust moderation tools to foster safer digital environments for Banglish-speaking communities. The research not only bridges a linguistic gap in hate speech detection but also contributes to broader efforts in combating online hate speech across diverse linguistic contexts.

Downloads

Download data is not yet available.

References

[1] Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017). Automated hate speech detection and the problem of offensive language. Eleventh International AAAI Conference on Web and Social Media .

[2] Fortuna, P., & Nunes, S. (2018). A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR), 51 (4), 85.

[3] Das, A. K., Al Asif, A., Paul, A., & Hossain, M. N. (2021). Bangla hate speech detection on social media using attention-based recurrent neural network. Journal of Intelligent Systems, 30 (1), 578-591.

[4] Ahmed, M. T., Rahman, M., Nur, S., Islam, A. Z. M. T., & Das, D. (2021). Natural language processing and machine learning based cyberbullying detection for Bangla and Romanized Bangla texts. TELKOMNIKA, 20 (1), 89-97.

[5] Schmidt, A., & Wiegand, M. (2017). A survey on hate speech detection using natural language processing. Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media .

[6] Zhang, Z., Robinson, D., & Tepper, J. (2018). Detecting hate speech on Twitter using a convolution-GRU based deep neural network. The Semantic Web .

[7] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT .

[8] Kumar, A., & Sachdeva, N. (2022). Multi-input integrative learning using deep neural networks and transfer learning for cyberbullying detection in real-time code-mix data. Multimedia Systems, 28 (6), 2027-2041.

[9] Das, A., Al Asif, A., Paul, A. & Hossain, M. (2021). Bangla hate speech detection on social media using attention-based recurrent neural network. Journal of Intelligent Systems, 30(1), 578-591. https://doi.org/10.1515/jisys-2020-0060.

[10] Ahmed, M. T., Rahman, M., Nur, S., Islam, A. Z. M. T., & Das, D. (2021). Natural language processing and machine learning based cyberbullying detection for Bangla and Romanized Bangla texts. TELKOMNIKA (Telecommunication Computing Electronics and Control), 20(1), 89-97.

[11] Kumar, A., & Sachdeva, N. (2022). Multimedia Systems, 28 (6), 2027-2041.

[12] Krishanu Maity, Abhishek Kumar and Sriparna Saha, “A Multitask Multimodal Framework for Sentiment and Emotion-Aided Cyberbullying Detection”, in IEEE Internet Computing, Print ISSN: 1089-7801, E-ISSN: 1941-0131, DOI: 10.1109/MIC.2022.3158583, Vol. 26, No. 4, pp. 68–78, July 2022, Published by IEEE, Available: https://ieeexplore.ieee.org/document/9733228.

[13] Akshi Kumar and Nitin Sachdeva, “Multi-input integrative learning using deep neural networks and transfer learning for cyberbullying detection in real-time code-mix data”, Multimedia System, Vol. 28, No. 6, pp. 2027–2041, December 2022, DOI: 10.1007/s00530-020-00672-7, Available: https://link.springer.com/article/10.1007/s00530-020-00672-7.

[14] Amit Kumar Das, Abdullah Al Asif, Anik Paul and Md. Nur Hossain, “Bangla hate speech detection on social media using attention-based recurrent neural network”, Journal of Intelligent Systems, Vol. 30, No. 1, pp. 578–591, 4September 2021, published by De Gruyter, DOI: 10.1515/jisys-2020-0060, Available: https://www.degruyter.com/document/doi/10.1515/jisys-2020-0060/html.

[15] Estiak Ahmed Emon, Shihab Rahman, Joti Banarjee, Amit Kumar Das and Tanni Mittra, “A Deep Learning Approach to Detect Abusive Bengali Text”, in Proceedings of the 2019 7th International Conference on Smart Computing& Communications (ICSCC), Sarawak, Malaysia, 28-30 June 2019, pp. 1–5, E-ISBN: 978-1-7281-1557-3, Print on Demand (PoD) ISBN: 978-1-7281-1558-0, Published by IEEE, DOI: 10.1109/ICSCC.2019.8843606, Available: https://ieeexplore.ieee.org/document/8843606.

[16] Md. Tofael Ahmed, Maqsudur Rahman, Shafayet Nur, Azm Islam and Dipankar Das, “Deployment of Machine Learning and Deep Learning Algorithms in Detecting Cyberbullying in Bangla and Romanized Bangla text: A Comparative Study”, in Proceedings of the 2021 International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), Bhilai, India, 19-20 February 2021, pp. 1–10, Electronic ISBN:978-1-7281-5791-7, Print on Demand (PoD) ISBN: 978-1-7281-5792-4, Published by IEEE, DOI:10.1109/ICAECT49130.2021.9392608, Available: https://ieeexplore.ieee.org/document/9392608.

[17] Shovon Ahammed, Mostafizur Rahman, Mahedi Hasan Niloy and S. M. Mazharul Hoque Chowdhury, “Implementation of Machine Learning to Detect Hate Speech in Bangla Language”, in Proceedings of the 2019 8thInternational Conference System Modeling and Advancement in Research Trends (SMART), Moradabad, India, 22-23November 2019, pp. 317–320, E-ISBN: 978-1-7281-3245-7, Print on Demand (PoD) ISBN: 978-1-7281-3246-4, DOI:10.1109/SMART46866.2019.9117214, Available: https://ieeexplore.ieee.org/document/9117214.

Research on Machine Learning Techniques for Unmasking Online Negative Speech

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Cover

Indexing & Abstracting