A deep learning approach for Named Entity Recognition in Urdu language

Article Subjects > Engineering Europe University of Atlantic > Research > Scientific Production
Ibero-american International University > Research > Scientific Production
Ibero-american International University > Research > Articles and Books
Universidad Internacional do Cuanza > Research > Scientific Production
Fundación Universitaria Internacional de Colombia > Research > Scientific Production Open English Named Entity Recognition (NER) is a natural language processing task that has been widely explored for different languages in the recent decade but is still an under-researched area for the Urdu language due to its rich morphology and language complexities. Existing state-of-the-art studies on Urdu NER use various deep-learning approaches through automatic feature selection using word embeddings. This paper presents a deep learning approach for Urdu NER that harnesses FastText and Floret word embeddings to capture the contextual information of words by considering the surrounding context of words for improved feature extraction. The pre-trained FastText and Floret word embeddings are publicly available for Urdu language which are utilized to generate feature vectors of four benchmark Urdu language datasets. These features are then used as input to train various combinations of Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), Gated Recurrent Unit (GRU), CRF, and deep learning models. The results show that our proposed approach significantly outperforms existing state-of-the-art studies on Urdu NER, achieving an F-score of up to 0.98 when using BiLSTM+GRU with Floret embeddings. Error analysis shows a low classification error rate ranging from 1.24% to 3.63% across various datasets showing the robustness of the proposed approach. The performance comparison shows that the proposed approach significantly outperforms similar existing studies. metadata Khan, Hikmat Ullah; Anam, Rimsha; Anwar, Muhammad Waqas; Jamal, Muhammad Hasan; Bajwa, Usama Ijaz; Diez, Isabel de la Torre; Silva Alvarado, Eduardo René; Soriano Flores, Emmanuel and Ashraf, Imran mail UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, eduardo.silva@funiber.org, emmanuel.soriano@uneatlantico.es, UNSPECIFIED (2024) A deep learning approach for Named Entity Recognition in Urdu language. PLOS ONE, 19 (3). e0300725. ISSN 1932-6203

Text
journal.pone.0300725.pdf
Available under License Creative Commons Attribution.
Download (1MB)

Official URL: http://doi.org/10.1371/journal.pone.0300725

Abstract

Named Entity Recognition (NER) is a natural language processing task that has been widely explored for different languages in the recent decade but is still an under-researched area for the Urdu language due to its rich morphology and language complexities. Existing state-of-the-art studies on Urdu NER use various deep-learning approaches through automatic feature selection using word embeddings. This paper presents a deep learning approach for Urdu NER that harnesses FastText and Floret word embeddings to capture the contextual information of words by considering the surrounding context of words for improved feature extraction. The pre-trained FastText and Floret word embeddings are publicly available for Urdu language which are utilized to generate feature vectors of four benchmark Urdu language datasets. These features are then used as input to train various combinations of Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), Gated Recurrent Unit (GRU), CRF, and deep learning models. The results show that our proposed approach significantly outperforms existing state-of-the-art studies on Urdu NER, achieving an F-score of up to 0.98 when using BiLSTM+GRU with Floret embeddings. Error analysis shows a low classification error rate ranging from 1.24% to 3.63% across various datasets showing the robustness of the proposed approach. The performance comparison shows that the proposed approach significantly outperforms similar existing studies.

Document Type:	Article
Subject classification:	Subjects > Engineering
Divisions:	Europe University of Atlantic > Research > Scientific Production Ibero-american International University > Research > Scientific Production Ibero-american International University > Research > Articles and Books Universidad Internacional do Cuanza > Research > Scientific Production Fundación Universitaria Internacional de Colombia > Research > Scientific Production
Deposited:	28 May 2024 23:30
Last Modified:	09 Dec 2024 23:30
URI:	https://repositorio.unib.org/id/eprint/12369

Actions (login required)

View Object

open

A novel approach for disease and pests detection in potato production system based on deep learning

Vulnerability of potato crops to diseases and pest infestation can affect its quality and lead to significant yield losses. Timely detection of such diseases can help take effective decisions. For this purpose, a deep learning-based object detection framework is designed in this study to identify and classify major potato diseases and pests under real-world field conditions. A total of 2,688 field images were collected from two research farms in Punjab, Pakistan, across multiple growth stages in various seasonal conditions. Excluding 285 symptoms-free images from the earliest collection led to 2,403 images which were annotated into four biotic-stress classes: blight disease (n = 630), leaf spot disease (n = 370), leafroll virus (viral symptom complex; n = 888), and Colorado potato beetle (larvae/adults; n = 515), indicating class imbalance. Several state-of-the-art models were used including YOLOv8 variants (n/s/m), YOLOv7, YOLOv5, and Faster R-CNN, and the results are discussed in relation to recent potato disease classification studies involving cropped leaf images. Stratified splitting (70% training, 20% validation, 10% testing) was applied to preserve class distribution across all subsets. YOLOv8-medium achieve the best performance with mean average precision (mAP)@0.5 of 98% on the held-out test images. Results for stable 5-fold cross-validation show a mean mAP@0.5 of 97.8%, which offers a balance between accuracy and inference time. Model robustness was evaluated using 5-fold cross-validation and repeated training with different random seeds, showing a low variance of ±0.4% mAP. Results demonstrate promising outcomes under the real-world field conditions, while, broader cross-region and cross-season validation is intended for the future.

Producción Científica

Ahmed Abbas mail , Saif Ur Rehman mail , Khalid Mahmood mail , Santos Gracia Villar mail santos.gracia@uneatlantico.es, Luis Alonso Dzul López mail luis.dzul@uneatlantico.es, Aseel Smerat mail , Imran Ashraf mail ,

Abbas

open

An attention-based deep learning model for early detection of polyphagous shot hole borer infestations in plants

The Polyphagous Shot Hole Borer (PSHB) is a highly invasive beetle that has been spreading like an epidemic across agricultural and forestry landscapes in recent years. Its rapid and destructive spread has turned it into a major global threat, causing widespread damage that continues to grow with time. Countries like South Africa, the United States, and Australia have implemented extensive measures to control the spread of PSHB, including the establishment of specialized agricultural support centers for early detection. However, there is still a strong need to make PSHB detection more accessible, allowing even non-experts to easily identify infections at an early stage. Artificial Intelligence (AI) has shown great promise in plant disease detection, but a major challenge in the case of PSHB was the lack of a suitable dataset for training AI models. In the proposed work, we first created a dedicated dataset by collecting images of trees infected with PSHB. We applied a range of preprocessing techniques to refine the dataset and prepare it for AI applications. Building on this, we developed a novel AI-based method, where we trained a deep learning model using a multi-convolutional layer network combined with a Fourier transformation layer. Additionally, an attention mechanism and advanced feature extraction techniques were incorporated to further boost model performance. As a result, the proposed approach achieved an impressive top accuracy of 92.3% in detecting PSHB infections, showing the potential of AI to offer a simple, efficient, and highly accurate solution for early disease detection.

Producción Científica

Rabbiya Younas mail , Hafiz Muhammad Raza ur Rehman mail , Gyu Sang Choi mail , Ángel Gabriel Kuc Castilla mail angel.kuc@uneatlantico.es, Carlos Eduardo Uc Ríos mail carlos.uc@unini.edu.mx, Imran Ashraf mail ,

Younas

open

Correction: Enhancing fault detection in new energy vehicles via novel ensemble approach

In the original version of this Article, Umair Shahid was incorrectly listed as a corresponding author. The correct corresponding authors for this Article are Imran Ashraf and Kashif Munir. Correspondence and request for materials should be addressed to ashrafimran@live.com and kashif.munir@kfueit.edu.pk.

Producción Científica

Iqra Akhtar mail , Mahnoor Nabeel mail , Umair Shahid mail , Kashif Munir mail , Ali Raza mail , Irene Delgado Noya mail irene.delgado@uneatlantico.es, Santos Gracia Villar mail santos.gracia@uneatlantico.es, Imran Ashraf mail ,

Akhtar

open

Benchmarking multiple instance learning architectures from patches to pathology for prostate cancer detection and grading using attention-based weak supervision

Histopathological evaluation is necessary for the diagnosis and grading of prostate cancer, which is still one of the most common cancers in men globally. Traditional evaluation is time-consuming, prone to inter-observer variability, and challenging to scale. The clinical usefulness of current AI systems is limited by the need for comprehensive pixel-level annotations. The objective of this research is to develop and evaluate a large-scale benchmarking study on a weakly supervised deep learning framework that minimizes the need for annotation and ensures interpretability for automated prostate cancer diagnosis and International Society of Urological Pathology (ISUP) grading using whole slide images (WSIs). This study rigorously tested six cutting-edge multiple instance learning (MIL) architectures (CLAM-MB, CLAM-SB, ILRA-MIL, AC-MIL, AMD-MIL, WiKG-MIL), three feature encoders (ResNet50, CTransPath, UNI2), and four patch extraction techniques (varying sizes and overlap) using the PANDA dataset (10,616 WSIs), yielding 72 experimental configurations. The methodology used distributed cloud computing to process over 31 million tissue patches, implementing advanced attention mechanisms to ensure clinical interpretability through Grad-CAM visualizations. The optimum configuration (UNI2 encoder with ILRA-MIL, 256 256 patches, 50% overlap) achieved 78.75% accuracy and 90.12% quadratic weighted kappa (QWK), outperforming traditional methods and approaching expert pathologist-level diagnostic capability. Overlapping smaller patches offered the best balance of spatial resolution and contextual information, while domain-specific foundation models performed noticeably better than generic encoders. This work is the first large-scale, comprehensive comparison of weekly supervised MIL methods for prostate cancer diagnosis and grading. The proposed approach has excellent clinical diagnostic performance, scalability, practical feasibility through cloud computing, and interpretability using visualization tools.

Producción Científica

Naveed Anwer Butt mail , Dilawaiz Sarwat mail , Irene Delgado Noya mail irene.delgado@uneatlantico.es, Kilian Tutusaus mail kilian.tutusaus@uneatlantico.es, Nagwan Abdel Samee mail , Imran Ashraf mail ,

Butt

open

A Systematic Literature Review on Integrated Deep Learning and Multi-Agent Vision-Language Frameworks for Pathology Image Analysis and Report Generation

This systematic literature review (SLR) investigates the integration of deep learning (DL), vision-language models(VLMs), and multi-agent systems in the analysis of pathology images and automated report generation. The rapidadvancement of whole-slide imaging (WSI) technologies has posed new challenges in pathology, especially due to thescale and complexity of the data. DL techniques in general and convolutional neural networks (CNNs) and transform-ers in particular have signiﬁcantly enhanced image analysis tasks including segmentation, classiﬁcation, and detection.However, these models often lack generalizability to generate coherent, clinically relevant text, thus necessitating theintegration of VLMs and large language models (LLMs). This review examines the eﬀectiveness of VLMs and LLMsin bridging the gap between visual data and clinical text, focusing on their potential for automating the generationof pathology reports. Additionally, multi-agent systems, which leverage specialized artiﬁcial intelligence (AI) agentsto collaboratively perform diagnostic tasks, are explored for their contributions to improving diagnostic accuracy andscalability. Through a synthesis of recent studies, this review highlights the successes, challenges, and future direc-tions of these AI technologies in pathology diagnostics, oﬀering a comprehensive foundation for the development ofintegrated, AI-driven diagnostic workﬂows.

Producción Científica

Usama Ali mail , Imran Shafi mail , Jamil Ahmad mail , Arlette Zárate Cáceres mail , Thania Chio Montero mail , Hafiz Muhammad Raza ur Rehman mail , Imran Ashraf mail ,

Ali

Links of Interest

Links of Interest

A deep learning approach for Named Entity Recognition in Urdu language

Abstract

Actions (login required)

SUBJECT

ACCESS

LANGUAGE

Filters