Real Word Spelling Error Detection and Correction for Urdu Language
Article
Subjects > Engineering
Europe University of Atlantic > Research > Scientific Production
Fundación Universitaria Internacional de Colombia > Research > Scientific Production
Ibero-american International University > Research > Scientific Production
Ibero-american International University > Research > Articles and Books
Universidad Internacional do Cuanza > Research > Scientific Production
Open
English
Non-word and real-word errors are generally two types of spelling errors. Non-word errors are misspelled words that are nonexistent in the lexicon while real-word errors are misspelled words that exist in the lexicon but are used out of context in a sentence. Lexicon-based lookup approach is widely used for non-word errors but it is incapable of handling real-word errors as they require contextual information. Contrary to the English language, real-word error detection and correction for low-resourced languages like Urdu is an unexplored area. This paper presents a real-word spelling error detection and correction approach for the Urdu language. We develop an extensive lexicon of 593,738 words and use this lexicon to develop a dataset for real-word errors comprising 125562 sentences and 2,552,735 words. Based on the developed lexicon and dataset, we then develop a contextual spell checker that detects and corrects real-word errors. For the real-word error detection phase, word-gram features are used along with five machine learning classifiers, achieving a precision, recall, and F1-score of 0.84,0.79, and 0.81 respectively. We also test the proposed approach with a 40% error density. For real-word error correction, the Damerau-Levenshtein distance is used along with the n-gram model for further ranking of the suggested candidate words, achieving an accuracy of up to 83.67%.
metadata
Aziz, Romila; Anwar, Muhammad Waqas; Jamal, Muhammad Hasan; Bajwa, Usama Ijaz; Kuc Castilla, Ángel Gabriel; Uc-Rios, Carlos; Bautista Thompson, Ernesto and Ashraf, Imran
mail
UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, carlos.uc@unini.edu.mx, ernesto.bautista@unini.edu.mx, UNSPECIFIED
(2023)
Real Word Spelling Error Detection and Correction for Urdu Language.
IEEE Access.
p. 1.
ISSN 2169-3536
|
Text
Real_Word_Spelling_Error_Detection_and_Correction_for_Urdu_Language.pdf Available under License Creative Commons Attribution Non-commercial No Derivatives. Download (3MB) |
Abstract
Non-word and real-word errors are generally two types of spelling errors. Non-word errors are misspelled words that are nonexistent in the lexicon while real-word errors are misspelled words that exist in the lexicon but are used out of context in a sentence. Lexicon-based lookup approach is widely used for non-word errors but it is incapable of handling real-word errors as they require contextual information. Contrary to the English language, real-word error detection and correction for low-resourced languages like Urdu is an unexplored area. This paper presents a real-word spelling error detection and correction approach for the Urdu language. We develop an extensive lexicon of 593,738 words and use this lexicon to develop a dataset for real-word errors comprising 125562 sentences and 2,552,735 words. Based on the developed lexicon and dataset, we then develop a contextual spell checker that detects and corrects real-word errors. For the real-word error detection phase, word-gram features are used along with five machine learning classifiers, achieving a precision, recall, and F1-score of 0.84,0.79, and 0.81 respectively. We also test the proposed approach with a 40% error density. For real-word error correction, the Damerau-Levenshtein distance is used along with the n-gram model for further ranking of the suggested candidate words, achieving an accuracy of up to 83.67%.
| Document Type: | Article |
|---|---|
| Keywords: | Real-word errors, spelling correction, spelling detection, spell checker |
| Subject classification: | Subjects > Engineering |
| Divisions: | Europe University of Atlantic > Research > Scientific Production Fundación Universitaria Internacional de Colombia > Research > Scientific Production Ibero-american International University > Research > Scientific Production Ibero-american International University > Research > Articles and Books Universidad Internacional do Cuanza > Research > Scientific Production |
| Deposited: | 14 Sep 2023 23:30 |
| Last Modified: | 14 Sep 2023 23:30 |
| URI: | https://repositorio.unib.org/id/eprint/8800 |
Actions (login required)
![]() |
View Object |
<a class="ep_document_link" href="/28319/1/s41598-026-45575-1_reference.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
A novel approach for disease and pests detection in potato production system based on deep learning
Vulnerability of potato crops to diseases and pest infestation can affect its quality and lead to significant yield losses. Timely detection of such diseases can help take effective decisions. For this purpose, a deep learning-based object detection framework is designed in this study to identify and classify major potato diseases and pests under real-world field conditions. A total of 2,688 field images were collected from two research farms in Punjab, Pakistan, across multiple growth stages in various seasonal conditions. Excluding 285 symptoms-free images from the earliest collection led to 2,403 images which were annotated into four biotic-stress classes: blight disease (n = 630), leaf spot disease (n = 370), leafroll virus (viral symptom complex; n = 888), and Colorado potato beetle (larvae/adults; n = 515), indicating class imbalance. Several state-of-the-art models were used including YOLOv8 variants (n/s/m), YOLOv7, YOLOv5, and Faster R-CNN, and the results are discussed in relation to recent potato disease classification studies involving cropped leaf images. Stratified splitting (70% training, 20% validation, 10% testing) was applied to preserve class distribution across all subsets. YOLOv8-medium achieve the best performance with mean average precision (mAP)@0.5 of 98% on the held-out test images. Results for stable 5-fold cross-validation show a mean mAP@0.5 of 97.8%, which offers a balance between accuracy and inference time. Model robustness was evaluated using 5-fold cross-validation and repeated training with different random seeds, showing a low variance of ±0.4% mAP. Results demonstrate promising outcomes under the real-world field conditions, while, broader cross-region and cross-season validation is intended for the future.
Ahmed Abbas mail , Saif Ur Rehman mail , Khalid Mahmood mail , Santos Gracia Villar mail santos.gracia@uneatlantico.es, Luis Alonso Dzul López mail luis.dzul@uneatlantico.es, Aseel Smerat mail , Imran Ashraf mail ,
Abbas
<a href="/28569/1/s12870-026-08847-6_reference.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
The Polyphagous Shot Hole Borer (PSHB) is a highly invasive beetle that has been spreading like an epidemic across agricultural and forestry landscapes in recent years. Its rapid and destructive spread has turned it into a major global threat, causing widespread damage that continues to grow with time. Countries like South Africa, the United States, and Australia have implemented extensive measures to control the spread of PSHB, including the establishment of specialized agricultural support centers for early detection. However, there is still a strong need to make PSHB detection more accessible, allowing even non-experts to easily identify infections at an early stage. Artificial Intelligence (AI) has shown great promise in plant disease detection, but a major challenge in the case of PSHB was the lack of a suitable dataset for training AI models. In the proposed work, we first created a dedicated dataset by collecting images of trees infected with PSHB. We applied a range of preprocessing techniques to refine the dataset and prepare it for AI applications. Building on this, we developed a novel AI-based method, where we trained a deep learning model using a multi-convolutional layer network combined with a Fourier transformation layer. Additionally, an attention mechanism and advanced feature extraction techniques were incorporated to further boost model performance. As a result, the proposed approach achieved an impressive top accuracy of 92.3% in detecting PSHB infections, showing the potential of AI to offer a simple, efficient, and highly accurate solution for early disease detection.
Rabbiya Younas mail , Hafiz Muhammad Raza ur Rehman mail , Gyu Sang Choi mail , Ángel Gabriel Kuc Castilla mail angel.kuc@uneatlantico.es, Carlos Eduardo Uc Ríos mail carlos.uc@unini.edu.mx, Imran Ashraf mail ,
Younas
<a class="ep_document_link" href="/28572/1/s41598-026-47906-8.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
Correction: Enhancing fault detection in new energy vehicles via novel ensemble approach
In the original version of this Article, Umair Shahid was incorrectly listed as a corresponding author. The correct corresponding authors for this Article are Imran Ashraf and Kashif Munir. Correspondence and request for materials should be addressed to ashrafimran@live.com and kashif.munir@kfueit.edu.pk.
Iqra Akhtar mail , Mahnoor Nabeel mail , Umair Shahid mail , Kashif Munir mail , Ali Raza mail , Irene Delgado Noya mail irene.delgado@uneatlantico.es, Santos Gracia Villar mail santos.gracia@uneatlantico.es, Imran Ashraf mail ,
Akhtar
<a class="ep_document_link" href="/27825/1/s41598-026-39196-x_reference.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
Histopathological evaluation is necessary for the diagnosis and grading of prostate cancer, which is still one of the most common cancers in men globally. Traditional evaluation is time-consuming, prone to inter-observer variability, and challenging to scale. The clinical usefulness of current AI systems is limited by the need for comprehensive pixel-level annotations. The objective of this research is to develop and evaluate a large-scale benchmarking study on a weakly supervised deep learning framework that minimizes the need for annotation and ensures interpretability for automated prostate cancer diagnosis and International Society of Urological Pathology (ISUP) grading using whole slide images (WSIs). This study rigorously tested six cutting-edge multiple instance learning (MIL) architectures (CLAM-MB, CLAM-SB, ILRA-MIL, AC-MIL, AMD-MIL, WiKG-MIL), three feature encoders (ResNet50, CTransPath, UNI2), and four patch extraction techniques (varying sizes and overlap) using the PANDA dataset (10,616 WSIs), yielding 72 experimental configurations. The methodology used distributed cloud computing to process over 31 million tissue patches, implementing advanced attention mechanisms to ensure clinical interpretability through Grad-CAM visualizations. The optimum configuration (UNI2 encoder with ILRA-MIL, 256 256 patches, 50% overlap) achieved 78.75% accuracy and 90.12% quadratic weighted kappa (QWK), outperforming traditional methods and approaching expert pathologist-level diagnostic capability. Overlapping smaller patches offered the best balance of spatial resolution and contextual information, while domain-specific foundation models performed noticeably better than generic encoders. This work is the first large-scale, comprehensive comparison of weekly supervised MIL methods for prostate cancer diagnosis and grading. The proposed approach has excellent clinical diagnostic performance, scalability, practical feasibility through cloud computing, and interpretability using visualization tools.
Naveed Anwer Butt mail , Dilawaiz Sarwat mail , Irene Delgado Noya mail irene.delgado@uneatlantico.es, Kilian Tutusaus mail kilian.tutusaus@uneatlantico.es, Nagwan Abdel Samee mail , Imran Ashraf mail ,
Butt
<a class="ep_document_link" href="/27915/1/csbj.0023.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
This systematic literature review (SLR) investigates the integration of deep learning (DL), vision-language models(VLMs), and multi-agent systems in the analysis of pathology images and automated report generation. The rapidadvancement of whole-slide imaging (WSI) technologies has posed new challenges in pathology, especially due to thescale and complexity of the data. DL techniques in general and convolutional neural networks (CNNs) and transform-ers in particular have significantly enhanced image analysis tasks including segmentation, classification, and detection.However, these models often lack generalizability to generate coherent, clinically relevant text, thus necessitating theintegration of VLMs and large language models (LLMs). This review examines the effectiveness of VLMs and LLMsin bridging the gap between visual data and clinical text, focusing on their potential for automating the generationof pathology reports. Additionally, multi-agent systems, which leverage specialized artificial intelligence (AI) agentsto collaboratively perform diagnostic tasks, are explored for their contributions to improving diagnostic accuracy andscalability. Through a synthesis of recent studies, this review highlights the successes, challenges, and future direc-tions of these AI technologies in pathology diagnostics, offering a comprehensive foundation for the development ofintegrated, AI-driven diagnostic workflows.
Usama Ali mail , Imran Shafi mail , Jamil Ahmad mail , Arlette Zárate Cáceres mail , Thania Chio Montero mail , Hafiz Muhammad Raza ur Rehman mail , Imran Ashraf mail ,
Ali
