Corpus Construction for Arabic Question Answering Subjectivity Classification

No Thumbnail Available

Date

2023

Journal Title

Journal ISSN

Volume Title

Publisher

university ghardaia

Abstract

Subjectivity and sentiment analysis, have gained significant attention in the field of Natural Language Processing (NLP) due to their ability to extract and classify subjective information expressed in textual data. Although, extensive research has been conducted on major languages such as English, Arabic with its dialectal variations lacks sufficient resources and research in this domain. This study aims to overcome the scarcity of resources in Arabic subjectivity analysis by constructing an extensive Arabic Question-Answering (QA) corpus specifically designed for subjectivity analysis. The corpus construction involves the following steps: data collection through web scraping, and data cleaning to ensure quality, followed by the annotation process by affecting subjectivity labels using two models that we developed utilizing the fine-tuning technique with two pre-trained models, XLM-RoBERTa and AraBERT. The availability of this corpus stimulates further research, drives advancements in Arabic NLP, and contributes to various applications in sentiment analysis and opinion mining.

Description

Keywords

Subjectivity analysis, sentiment analysis, fine-tuning, AraBERT, XLMRoBERTa.

Citation

Endorsement

Review

Supplemented By

Referenced By