Depression Intensity Classification from Tweets Using FastText Based Weighted Soft Voting Ensemble

Article Subjects > Engineering
Subjects > Psychology
Europe University of Atlantic > Research > Scientific Production
Fundación Universitaria Internacional de Colombia > Research > Scientific Production
Ibero-american International University > Research > Scientific Production
Ibero-american International University > Research > Articles and books
Universidad Internacional do Cuanza > Research > Scientific Production
Abierto Inglés Predicting depression intensity from microblogs and social media posts has numerous benefits and applications, including predicting early psychological disorders and stress in individuals or the general public. A major challenge in predicting depression using social media posts is that the existing studies do not focus on predicting the intensity of depression in social media texts but rather only perform the binary classification of depression and moreover noisy data makes it difficult to predict the true depression in the social media text. This study intends to begin by collecting relevant Tweets and generating a corpus of 210000 public tweets using Twitter public application programming interfaces (APIs). A strategy is devised to filter out only depression-related tweets by creating a list of relevant hashtags to reduce noise in the corpus. Furthermore, an algorithm is developed to annotate the data into three depression classes: ‘Mild,’ ‘Moderate,’ and ‘Severe,’ based on International Classification of Diseases-10 (ICD-10) depression diagnostic criteria. Different baseline classifiers are applied to the annotated dataset to get a preliminary idea of classification performance on the corpus. Further FastText-based model is applied and fine-tuned with different preprocessing techniques and hyperparameter tuning to produce the tuned model, which significantly increases the depression classification performance to an 84% F1 score and 90% accuracy compared to baselines. Finally, a FastText-based weighted soft voting ensemble (WSVE) is proposed to boost the model’s performance by combining several other classifiers and assigning weights to individual models according to their individual performances. The proposed WSVE outperformed all baselines as well as FastText alone, with an F1 of 89%, 5% higher than FastText alone, and an accuracy of 93%, 3% higher than FastText alone. The proposed model better captures the contextual features of the relatively small sample class and aids in the detection of early depression intensity prediction from tweets with impactful performances. metadata Rizwan, Muhammad and Mushtaq, Muhammad Faheem and Rafiq, Maryam and Mehmood, Arif and Diez, Isabel de la Torre and Gracia Villar, Mónica and Garay, Helena and Ashraf, Imran mail UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, monica.gracia@uneatlantico.es, helena.garay@uneatlantico.es, UNSPECIFIED (2024) Depression Intensity Classification from Tweets Using FastText Based Weighted Soft Voting Ensemble. Computers, Materials & Continua, 78 (2). pp. 2047-2066. ISSN 1546-2226

[img] Text
TSP_CMC_37347.pdf
Available under License Creative Commons Attribution.

Download (861kB)

Abstract

Predicting depression intensity from microblogs and social media posts has numerous benefits and applications, including predicting early psychological disorders and stress in individuals or the general public. A major challenge in predicting depression using social media posts is that the existing studies do not focus on predicting the intensity of depression in social media texts but rather only perform the binary classification of depression and moreover noisy data makes it difficult to predict the true depression in the social media text. This study intends to begin by collecting relevant Tweets and generating a corpus of 210000 public tweets using Twitter public application programming interfaces (APIs). A strategy is devised to filter out only depression-related tweets by creating a list of relevant hashtags to reduce noise in the corpus. Furthermore, an algorithm is developed to annotate the data into three depression classes: ‘Mild,’ ‘Moderate,’ and ‘Severe,’ based on International Classification of Diseases-10 (ICD-10) depression diagnostic criteria. Different baseline classifiers are applied to the annotated dataset to get a preliminary idea of classification performance on the corpus. Further FastText-based model is applied and fine-tuned with different preprocessing techniques and hyperparameter tuning to produce the tuned model, which significantly increases the depression classification performance to an 84% F1 score and 90% accuracy compared to baselines. Finally, a FastText-based weighted soft voting ensemble (WSVE) is proposed to boost the model’s performance by combining several other classifiers and assigning weights to individual models according to their individual performances. The proposed WSVE outperformed all baselines as well as FastText alone, with an F1 of 89%, 5% higher than FastText alone, and an accuracy of 93%, 3% higher than FastText alone. The proposed model better captures the contextual features of the relatively small sample class and aids in the detection of early depression intensity prediction from tweets with impactful performances.

Item Type: Article
Uncontrolled Keywords: Depression classification; deep learning; FastText; machine learning
Subjects: Subjects > Engineering
Subjects > Psychology
Divisions: Europe University of Atlantic > Research > Scientific Production
Fundación Universitaria Internacional de Colombia > Research > Scientific Production
Ibero-american International University > Research > Scientific Production
Ibero-american International University > Research > Articles and books
Universidad Internacional do Cuanza > Research > Scientific Production
Date Deposited: 14 Mar 2024 23:30
Last Modified: 14 Mar 2024 23:30
URI: https://repositorio.unib.org/id/eprint/11264

Actions (login required)

View Item View Item

<a href="/10290/1/Influence%20of%20E-learning%20training%20on%20the%20acquisition%20of%20competences%20in%20basketball%20coaches%20in%20Cantabria.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Influence of E-learning training on the acquisition of competences in basketball coaches in Cantabria

The main aim of this study was to analyse the influence of e-learning training on the acquisition of competences in basketball coaches in Cantabria. The current landscape of basketball coach training shows an increasing demand for innovative training models and emerging pedagogies, including e-learning-based methodologies. The study sample consisted of fifty students from these courses, all above 16 years of age (36 males, 14 females). Among them, 16% resided outside the autonomous community of Cantabria, 10% resided more than 50 km from the city of Santander, 36% between 10 and 50 km, 14% less than 10 km, and 24% resided within Santander city. Data were collected through a Google Forms survey distributed by the Cantabrian Basketball Federation to training course students. Participation was voluntary and anonymous. The survey, consisting of 56 questions, was validated by two sports and health doctors and two senior basketball coaches. The collected data were processed and analysed using Microsoft® Excel version 16.74, and the results were expressed in percentages. The analysis revealed that 24.60% of the students trained through the e-learning methodology considered themselves fully qualified as basketball coaches, contrasting with 10.98% of those trained via traditional face-to-face methodology. The results of the study provide insights into important characteristics that can be adjusted and improved within the investigated educational process. Moreover, the study concludes that e-learning training effectively qualifies basketball coaches in Cantabria.

Producción Científica

Josep Alemany Iturriaga mail josep.alemany@uneatlantico.es, Álvaro Velarde-Sotres mail alvaro.velarde@uneatlantico.es, Javier Jorge mail , Kamil Giglio mail ,

Alemany Iturriaga

<a class="ep_document_link" href="/15198/1/nutrients-16-03859.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Carotenoids Intake and Cardiovascular Prevention: A Systematic Review

Background: Cardiovascular diseases (CVDs) encompass a variety of conditions that affect the heart and blood vessels. Carotenoids, a group of fat-soluble organic pigments synthesized by plants, fungi, algae, and some bacteria, may have a beneficial effect in reducing cardiovascular disease (CVD) risk. This study aims to examine and synthesize current research on the relationship between carotenoids and CVDs. Methods: A systematic review was conducted using MEDLINE and the Cochrane Library to identify relevant studies on the efficacy of carotenoid supplementation for CVD prevention. Interventional analytical studies (randomized and non-randomized clinical trials) published in English from January 2011 to February 2024 were included. Results: A total of 38 studies were included in the qualitative analysis. Of these, 17 epidemiological studies assessed the relationship between carotenoids and CVDs, 9 examined the effect of carotenoid supplementation, and 12 evaluated dietary interventions. Conclusions: Elevated serum carotenoid levels are associated with reduced CVD risk factors and inflammatory markers. Increasing the consumption of carotenoid-rich foods appears to be more effective than supplementation, though the specific effects of individual carotenoids on CVD risk remain uncertain.

Producción Científica

Sandra Sumalla Cano mail sandra.sumalla@uneatlantico.es, Imanol Eguren García mail imanol.eguren@uneatlantico.es, Álvaro Lasarte García mail , Thomas Prola mail thomas.prola@uneatlantico.es, Raquel Martínez Díaz mail raquel.martinez@uneatlantico.es, Iñaki Elío Pascual mail inaki.elio@uneatlantico.es,

Sumalla Cano

<a class="ep_document_link" href="/15441/1/journal.pone.0313835.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

StackIL10: A stacking ensemble model for the improved prediction of IL-10 inducing peptides

Interleukin-10, a highly effective cytokine recognized for its anti-inflammatory properties, plays a critical role in the immune system. In addition to its well-documented capacity to mitigate inflammation, IL-10 can unexpectedly demonstrate pro-inflammatory characteristics under specific circumstances. The presence of both aspects emphasizes the vital need to identify the IL-10-induced peptide. To mitigate the drawbacks of manual identification, which include its high cost, this study introduces StackIL10, an ensemble learning model based on stacking, to identify IL-10-inducing peptides in a precise and efficient manner. Ten Amino-acid-composition-based Feature Extraction approaches are considered. The StackIL10, stacking ensemble, the model with five optimized Machine Learning Algorithm (specifically LGBM, RF, SVM, Decision Tree, KNN) as the base learners and a Logistic Regression as the meta learner was constructed, and the identification rate reached 91.7%, MCC of 0.833 with 0.9078 Specificity. Experiments were conducted to examine the impact of various enhancement techniques on the correctness of IL-10 Prediction. These experiments included comparisons between single models and various combinations of stacking-based ensemble models. It was demonstrated that the model proposed in this study was more effective than singular models and produced satisfactory results, thereby improving the identification of peptides that induce IL-10.

Producción Científica

Salman Sadullah Usmani mail , Izaz Ahmmed Tuhin mail , Md. Rajib Mia mail , Md. Monirul Islam mail , Imran Mahmud mail , Carlos Eduardo Uc Ríos mail carlos.uc@unini.edu.mx, Henry Fabian Gongora mail henry.gongora@uneatlantico.es, Imran Ashraf mail , Md. Abdus Samad mail ,

Usmani

<a class="ep_document_link" href="/15444/1/s41598-024-79106-7.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Roman urdu hate speech detection using hybrid machine learning models and hyperparameter optimization

With the rapid increase of users over social media, cyberbullying, and hate speech problems have arisen over the past years. Automatic hate speech detection (HSD) from text is an emerging research problem in natural language processing (NLP). Researchers developed various approaches to solve the automatic hate speech detection problem using different corpora in various languages, however, research on the Urdu language is rather scarce. This study aims to address the HSD task on Twitter using Roman Urdu text. The contribution of this research is the development of a hybrid model for Roman Urdu HSD, which has not been previously explored. The novel hybrid model integrates deep learning (DL) and transformer models for automatic feature extraction, combined with machine learning algorithms (MLAs) for classification. To further enhance model performance, we employ several hyperparameter optimization (HPO) techniques, including Grid Search (GS), Randomized Search (RS), and Bayesian Optimization with Gaussian Processes (BOGP). Evaluation is carried out on two publicly available benchmarks Roman Urdu corpora comprising HS-RU-20 corpus and RUHSOLD hate speech corpus. Results demonstrate that the Multilingual BERT (MBERT) feature learner, paired with a Support Vector Machine (SVM) classifier and optimized using RS, achieves state-of-the-art performance. On the HS-RU-20 corpus, this model attained an accuracy of 0.93 and an F1 score of 0.95 for the Neutral-Hostile classification task, and an accuracy of 0.89 with an F1 score of 0.88 for the Hate Speech-Offensive task. On the RUHSOLD corpus, the same model achieved an accuracy of 0.95 and an F1 score of 0.94 for the Coarse-grained task, alongside an accuracy of 0.87 and an F1 score of 0.84 for the Fine-grained task. These results demonstrate the effectiveness of our hybrid approach for Roman Urdu hate speech detection.

Producción Científica

Waqar Ashiq mail , Samra Kanwal mail , Adnan Rafique mail , Muhammad Waqas mail , Tahir Khurshaid mail , Elizabeth Caro Montero mail elizabeth.caro@uneatlantico.es, Alicia Bustamante Alonso mail alicia.bustamante@uneatlantico.es, Imran Ashraf mail ,

Ashiq

<a href="/14584/1/s41598-024-73664-6.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Performance of the 4C and SEIMC scoring systems in predicting mortality from onset to current COVID-19 pandemic in emergency departments

The evolution of the COVID-19 pandemic has been associated with variations in clinical presentation and severity. Similarly, prediction scores may suffer changes in their diagnostic accuracy. The aim of this study was to test the 30-day mortality predictive validity of the 4C and SEIMC scores during the sixth wave of the pandemic and to compare them with those of validation studies. This was a longitudinal retrospective observational study. COVID-19 patients who were admitted to the Emergency Department of a Spanish hospital from December 15, 2021, to January 31, 2022, were selected. A side-by-side comparison with the pivotal validation studies was subsequently performed. The main measures were 30-day mortality and the 4C and SEIMC scores. A total of 27,614 patients were considered in the study, including 22,361 from the 4C, 4,627 from the SEIMC and 626 from our hospital. The 30-day mortality rate was significantly lower than that reported in the validation studies. The AUCs were 0.931 (95% CI: 0.90–0.95) for 4C and 0.903 (95% CI: 086–0.93) for SEIMC, which were significantly greater than those obtained in the first wave. Despite the changes that have occurred during the coronavirus disease 2019 (COVID-19) pandemic, with a reduction in lethality, scorecard systems are currently still useful tools for detecting patients with poor disease risk, with better prognostic capacity.

Producción Científica

Pedro Ángel de Santos Castro mail , Carlos del Pozo Vegas mail , Leyre Teresa Pinilla Arribas mail , Daniel Zalama Sánchez mail , Ancor Sanz-García mail , Tony Giancarlo Vásquez del Águila mail , Pablo González Izquierdo mail , Sara de Santos Sánchez mail , Cristina Mazas Pérez-Oleaga mail cristina.mazas@uneatlantico.es, Irma Dominguez Azpíroz mail irma.dominguez@unini.edu.mx, Iñaki Elío Pascual mail inaki.elio@uneatlantico.es, Francisco Martín-Rodríguez mail ,

de Santos Castro