Depression Intensity Classification from Tweets Using FastText Based Weighted Soft Voting Ensemble

Artículo Materias > Ingeniería
Materias > Psicología
Universidad Europea del Atlántico > Investigación > Producción Científica
Fundación Universitaria Internacional de Colombia > Investigación > Producción Científica
Universidad Internacional Iberoamericana México > Investigación > Producción Científica
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Artículos y libros
Universidad Internacional do Cuanza > Investigación > Producción Científica
Abierto Inglés Predicting depression intensity from microblogs and social media posts has numerous benefits and applications, including predicting early psychological disorders and stress in individuals or the general public. A major challenge in predicting depression using social media posts is that the existing studies do not focus on predicting the intensity of depression in social media texts but rather only perform the binary classification of depression and moreover noisy data makes it difficult to predict the true depression in the social media text. This study intends to begin by collecting relevant Tweets and generating a corpus of 210000 public tweets using Twitter public application programming interfaces (APIs). A strategy is devised to filter out only depression-related tweets by creating a list of relevant hashtags to reduce noise in the corpus. Furthermore, an algorithm is developed to annotate the data into three depression classes: ‘Mild,’ ‘Moderate,’ and ‘Severe,’ based on International Classification of Diseases-10 (ICD-10) depression diagnostic criteria. Different baseline classifiers are applied to the annotated dataset to get a preliminary idea of classification performance on the corpus. Further FastText-based model is applied and fine-tuned with different preprocessing techniques and hyperparameter tuning to produce the tuned model, which significantly increases the depression classification performance to an 84% F1 score and 90% accuracy compared to baselines. Finally, a FastText-based weighted soft voting ensemble (WSVE) is proposed to boost the model’s performance by combining several other classifiers and assigning weights to individual models according to their individual performances. The proposed WSVE outperformed all baselines as well as FastText alone, with an F1 of 89%, 5% higher than FastText alone, and an accuracy of 93%, 3% higher than FastText alone. The proposed model better captures the contextual features of the relatively small sample class and aids in the detection of early depression intensity prediction from tweets with impactful performances. metadata Rizwan, Muhammad; Mushtaq, Muhammad Faheem; Rafiq, Maryam; Mehmood, Arif; Diez, Isabel de la Torre; Gracia Villar, Mónica; Garay, Helena y Ashraf, Imran mail SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, monica.gracia@uneatlantico.es, helena.garay@uneatlantico.es, SIN ESPECIFICAR (2024) Depression Intensity Classification from Tweets Using FastText Based Weighted Soft Voting Ensemble. Computers, Materials & Continua, 78 (2). pp. 2047-2066. ISSN 1546-2226

[img] Texto
TSP_CMC_37347.pdf
Available under License Creative Commons Attribution.

Descargar (861kB)

Resumen

Predicting depression intensity from microblogs and social media posts has numerous benefits and applications, including predicting early psychological disorders and stress in individuals or the general public. A major challenge in predicting depression using social media posts is that the existing studies do not focus on predicting the intensity of depression in social media texts but rather only perform the binary classification of depression and moreover noisy data makes it difficult to predict the true depression in the social media text. This study intends to begin by collecting relevant Tweets and generating a corpus of 210000 public tweets using Twitter public application programming interfaces (APIs). A strategy is devised to filter out only depression-related tweets by creating a list of relevant hashtags to reduce noise in the corpus. Furthermore, an algorithm is developed to annotate the data into three depression classes: ‘Mild,’ ‘Moderate,’ and ‘Severe,’ based on International Classification of Diseases-10 (ICD-10) depression diagnostic criteria. Different baseline classifiers are applied to the annotated dataset to get a preliminary idea of classification performance on the corpus. Further FastText-based model is applied and fine-tuned with different preprocessing techniques and hyperparameter tuning to produce the tuned model, which significantly increases the depression classification performance to an 84% F1 score and 90% accuracy compared to baselines. Finally, a FastText-based weighted soft voting ensemble (WSVE) is proposed to boost the model’s performance by combining several other classifiers and assigning weights to individual models according to their individual performances. The proposed WSVE outperformed all baselines as well as FastText alone, with an F1 of 89%, 5% higher than FastText alone, and an accuracy of 93%, 3% higher than FastText alone. The proposed model better captures the contextual features of the relatively small sample class and aids in the detection of early depression intensity prediction from tweets with impactful performances.

Tipo de Documento: Artículo
Palabras Clave: Depression classification; deep learning; FastText; machine learning
Clasificación temática: Materias > Ingeniería
Materias > Psicología
Divisiones: Universidad Europea del Atlántico > Investigación > Producción Científica
Fundación Universitaria Internacional de Colombia > Investigación > Producción Científica
Universidad Internacional Iberoamericana México > Investigación > Producción Científica
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Artículos y libros
Universidad Internacional do Cuanza > Investigación > Producción Científica
Depositado: 14 Mar 2024 23:30
Ultima Modificación: 14 Mar 2024 23:30
URI: https://repositorio.unib.org/id/eprint/11264

Acciones (logins necesarios)

Ver Objeto Ver Objeto

<a href="/10290/1/Influence%20of%20E-learning%20training%20on%20the%20acquisition%20of%20competences%20in%20basketball%20coaches%20in%20Cantabria.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Influence of E-learning training on the acquisition of competences in basketball coaches in Cantabria

The main aim of this study was to analyse the influence of e-learning training on the acquisition of competences in basketball coaches in Cantabria. The current landscape of basketball coach training shows an increasing demand for innovative training models and emerging pedagogies, including e-learning-based methodologies. The study sample consisted of fifty students from these courses, all above 16 years of age (36 males, 14 females). Among them, 16% resided outside the autonomous community of Cantabria, 10% resided more than 50 km from the city of Santander, 36% between 10 and 50 km, 14% less than 10 km, and 24% resided within Santander city. Data were collected through a Google Forms survey distributed by the Cantabrian Basketball Federation to training course students. Participation was voluntary and anonymous. The survey, consisting of 56 questions, was validated by two sports and health doctors and two senior basketball coaches. The collected data were processed and analysed using Microsoft® Excel version 16.74, and the results were expressed in percentages. The analysis revealed that 24.60% of the students trained through the e-learning methodology considered themselves fully qualified as basketball coaches, contrasting with 10.98% of those trained via traditional face-to-face methodology. The results of the study provide insights into important characteristics that can be adjusted and improved within the investigated educational process. Moreover, the study concludes that e-learning training effectively qualifies basketball coaches in Cantabria.

Producción Científica

Josep Alemany Iturriaga mail josep.alemany@uneatlantico.es, Álvaro Velarde-Sotres mail alvaro.velarde@uneatlantico.es, Javier Jorge mail , Kamil Giglio mail ,

Alemany Iturriaga

<a href="/12750/1/s41598-024-63831-0.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Efficient deep learning-based approach for malaria detection using red blood cell smears

Malaria is an extremely malignant disease and is caused by the bites of infected female mosquitoes. This disease is not only infectious among humans, but among animals as well. Malaria causes mild symptoms like fever, headache, sweating and vomiting, and muscle discomfort; severe symptoms include coma, seizures, and kidney failure. The timely identification of malaria parasites is a challenging and chaotic endeavor for health staff. An expert technician examines the schematic blood smears of infected red blood cells through a microscope. The conventional methods for identifying malaria are not efficient. Machine learning approaches are effective for simple classification challenges but not for complex tasks. Furthermore, machine learning involves rigorous feature engineering to train the model and detect patterns in the features. On the other hand, deep learning works well with complex tasks and automatically extracts low and high-level features from the images to detect disease. In this paper, EfficientNet, a deep learning-based approach for detecting Malaria, is proposed that uses red blood cell images. Experiments are carried out and performance comparison is made with pre-trained deep learning models. In addition, k-fold cross-validation is also used to substantiate the results of the proposed approach. Experiments show that the proposed approach is 97.57% accurate in detecting Malaria from red blood cell images and can be beneficial practically for medical healthcare staff.

Producción Científica

Muhammad Mujahid mail , Furqan Rustam mail , Rahman Shafique mail , Elizabeth Caro Montero mail elizabeth.caro@uneatlantico.es, Eduardo René Silva Alvarado mail eduardo.silva@funiber.org, Isabel de la Torre Diez mail , Imran Ashraf mail ,

Mujahid

<a href="/12751/1/s12874-024-02249-8.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Feature group partitioning: an approach for depression severity prediction with class balancing using machine learning algorithms

In contemporary society, depression has emerged as a prominent mental disorder that exhibits exponential growth and exerts a substantial influence on premature mortality. Although numerous research applied machine learning methods to forecast signs of depression. Nevertheless, only a limited number of research have taken into account the severity level as a multiclass variable. Besides, maintaining the equality of data distribution among all the classes rarely happens in practical communities. So, the inevitable class imbalance for multiple variables is considered a substantial challenge in this domain. Furthermore, this research emphasizes the significance of addressing class imbalance issues in the context of multiple classes. We introduced a new approach Feature group partitioning (FGP) in the data preprocessing phase which effectively reduces the dimensionality of features to a minimum. This study utilized synthetic oversampling techniques, specifically Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic (ADASYN), for class balancing. The dataset used in this research was collected from university students by administering the Burn Depression Checklist (BDC). For methodological modifications, we implemented heterogeneous ensemble learning stacking, homogeneous ensemble bagging, and five distinct supervised machine learning algorithms. The issue of overfitting was mitigated by evaluating the accuracy of the training, validation, and testing datasets. To justify the effectiveness of the prediction models, balanced accuracy, sensitivity, specificity, precision, and f1-score indices are used. Overall, comprehensive analysis demonstrates the discrimination between the Conventional Depression Screening (CDS) and FGP approach. In summary, the results show that the stacking classifier for FGP with SMOTE approach yields the highest balanced accuracy, with a rate of 92.81%. The empirical evidence has demonstrated that the FGP approach, when combined with the SMOTE, able to produce better performance in predicting the severity of depression. Most importantly the optimization of the training time of the FGP approach for all of the classifiers is a significant achievement of this research.

Producción Científica

Tumpa Rani Shaha mail , Momotaz Begum mail , Jia Uddin mail , Vanessa Yélamos Torres mail vanessa.yelamos@funiber.org, Josep Alemany Iturriaga mail josep.alemany@uneatlantico.es, Imran Ashraf mail , Md. Abdus Samad mail ,

Shaha

<a class="ep_document_link" href="/13000/1/diagnostics-14-01292.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

A Comparison of the Clinical Characteristics of Short-, Mid-, and Long-Term Mortality in Patients Attended by the Emergency Medical Services: An Observational Study

Aim: The development of predictive models for patients treated by emergency medical services (EMS) is on the rise in the emergency field. However, how these models evolve over time has not been studied. The objective of the present work is to compare the characteristics of patients who present mortality in the short, medium and long term, and to derive and validate a predictive model for each mortality time. Methods: A prospective multicenter study was conducted, which included adult patients with unselected acute illness who were treated by EMS. The primary outcome was noncumulative mortality from all causes by time windows including 30-day mortality, 31- to 180-day mortality, and 181- to 365-day mortality. Prehospital predictors included demographic variables, standard vital signs, prehospital laboratory tests, and comorbidities. Results: A total of 4830 patients were enrolled. The noncumulative mortalities at 30, 180, and 365 days were 10.8%, 6.6%, and 3.5%, respectively. The best predictive value was shown for 30-day mortality (AUC = 0.930; 95% CI: 0.919–0.940), followed by 180-day (AUC = 0.852; 95% CI: 0.832–0.871) and 365-day (AUC = 0.806; 95% CI: 0.778–0.833) mortality. Discussion: Rapid characterization of patients at risk of short-, medium-, or long-term mortality could help EMS to improve the treatment of patients suffering from acute illnesses.

Producción Científica

Rodrigo Enriquez de Salamanca Gambara mail , Ancor Sanz-García mail , Carlos del Pozo Vegas mail , Raúl López-Izquierdo mail , Irene Sánchez Soberón mail , Juan F. Delgado Benito mail , Raquel Martínez Díaz mail raquel.martinez@uneatlantico.es, Cristina Mazas Pérez-Oleaga mail cristina.mazas@uneatlantico.es, Nohora Milena Martínez López mail nohora.martinez@uneatlantico.es, Irma Dominguez Azpíroz mail irma.dominguez@unini.edu.mx, Francisco Martín-Rodríguez mail ,

Enriquez de Salamanca Gambara

<a class="ep_document_link" href="/11941/1/healthcare-12-00942.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Risk Factors for Eating Disorders in University Students: The RUNEAT Study

The purpose of the study is to assess the risk of developing general eating disorders (ED), anorexia nervosa (AN), and bulimia nervosa (BN), as well as to examine the effects of gender, academic year, place of residence, faculty, and diet quality on that risk. Over two academic years, 129 first- and fourth-year Uneatlántico students were included in an observational descriptive study. The self-administered tests SCOFF, EAT-26, and BITE were used to determine the participants’ risk of developing ED. The degree of adherence to the Mediterranean diet (MD) was used to evaluate the quality of the diet. Data were collected at the beginning (T1) and at the end (T2) of the academic year. The main results were that at T1, 34.9% of participants were at risk of developing general ED, AN 3.9%, and BN 16.3%. At T2, these percentages were 37.2%, 14.7%, and 8.5%, respectively. At T2, the frequency of general ED in the female group was 2.5 times higher (OR: 2.55, 95% CI: 1.22–5.32, p = 0.012). The low-moderate adherence to the MD students’ group was 0.92 times less frequent than general ED at T2 (OR: 0.921, 95%CI: 0.385–2.20, p < 0.001). The most significant risk factor for developing ED is being a female in the first year of university. Moreover, it appears that the likelihood of developing ED generally increases during the academic year.

Producción Científica

Imanol Eguren García mail imanol.eguren@uneatlantico.es, Sandra Sumalla Cano mail sandra.sumalla@uneatlantico.es, Sandra Conde González mail , Anna Vila-Martí mail , Mercedes Briones Urbano mail mercedes.briones@uneatlantico.es, Raquel Martínez Díaz mail raquel.martinez@uneatlantico.es, Iñaki Elío Pascual mail inaki.elio@uneatlantico.es,

Eguren García