eprintid: 11065
rev_number: 9
eprint_status: archive
userid: 2
dir: disk0/00/01/10/65
datestamp: 2024-02-29 23:30:17
lastmod: 2024-02-29 23:30:19
status_changed: 2024-02-29 23:30:17
type: article
metadata_visibility: show
creators_name: Jamil, Azhar
creators_name: Rehman, Saif Ur
creators_name: Mahmood, Khalid
creators_name: Gracia Villar, Mónica
creators_name: Prola, Thomas
creators_name: Diez, Isabel De La Torre
creators_name: Samad, Md Abdus
creators_name: Ashraf, Imran
creators_id: 
creators_id: 
creators_id: 
creators_id: monica.gracia@uneatlantico.es
creators_id: thomas.prola@uneatlantico.es
creators_id: 
creators_id: 
creators_id: 
title: Deep Learning Approaches for Image Captioning: Opportunities, Challenges and Future Potential
ispublished: pub
subjects: uneat_eng
divisions: uneatlantico_produccion_cientifica
divisions: uninimx_produccion_cientifica
divisions: uninipr_produccion_cientifica
divisions: unic_produccion_cientifica
divisions: uniromana_produccion_cientifica
full_text_status: public
keywords: Image captioning, deep learning, image processing, artificial intelligence
abstract: Generative intelligence relies heavily on the integration of vision and language. Much of the research has focused on image captioning, which involves describing images with meaningful sentences. Typically, when generating sentences that describe the visual content, a language model and a vision encoder are commonly employed. Because of the incorporation of object areas, properties, multi-modal connections, attentive techniques, and early fusion approaches like bidirectional encoder representations from transformers (BERT), these components have experienced substantial advancements over the years. This research offers a reference to the body of literature, identifies emerging trends in an area that blends computer vision as well as natural language processing in order to maximize their complementary effects, and identifies the most significant technological improvements in architectures employed for image captioning. It also discusses various problem variants and open challenges. This comparison allows for an objective assessment of different techniques, architectures, and training strategies by identifying the most significant technical innovations, and offers valuable insights into the current landscape of image captioning research.
date: 2024-02
publication: IEEE Access
pagerange: 1-1
id_number: doi:10.1109/ACCESS.2024.3365528
refereed: TRUE
issn: 2169-3536
official_url: http://doi.org/10.1109/ACCESS.2024.3365528
access: open
language: en
citation:   Artículo Materias > Ingeniería <http://repositorio.unib.org/view/subjects/uneat=5Feng.html> Universidad Europea del Atlántico > Investigación > Producción Científica <http://repositorio.unib.org/view/divisions/uneatlantico=5Fproduccion=5Fcientifica.html>
Universidad Internacional Iberoamericana México > Investigación > Producción Científica <http://repositorio.unib.org/view/divisions/uninimx=5Fproduccion=5Fcientifica.html>
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica <http://repositorio.unib.org/view/divisions/uninipr=5Fproduccion=5Fcientifica.html>
Universidad Internacional do Cuanza > Investigación > Producción Científica <http://repositorio.unib.org/view/divisions/unic=5Fproduccion=5Fcientifica.html>
Universidad de La Romana > Investigación > Producción Científica <http://repositorio.unib.org/view/divisions/uniromana=5Fproduccion=5Fcientifica.html> Abierto Inglés Generative intelligence relies heavily on the integration of vision and language. Much of the research has focused on image captioning, which involves describing images with meaningful sentences. Typically, when generating sentences that describe the visual content, a language model and a vision encoder are commonly employed. Because of the incorporation of object areas, properties, multi-modal connections, attentive techniques, and early fusion approaches like bidirectional encoder representations from transformers (BERT), these components have experienced substantial advancements over the years. This research offers a reference to the body of literature, identifies emerging trends in an area that blends computer vision as well as natural language processing in order to maximize their complementary effects, and identifies the most significant technological improvements in architectures employed for image captioning. It also discusses various problem variants and open challenges. This comparison allows for an objective assessment of different techniques, architectures, and training strategies by identifying the most significant technical innovations, and offers valuable insights into the current landscape of image captioning research. metadata Jamil, Azhar; Rehman, Saif Ur; Mahmood, Khalid; Gracia Villar, Mónica; Prola, Thomas; Diez, Isabel De La Torre; Samad, Md Abdus y Ashraf, Imran mail SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, monica.gracia@uneatlantico.es, thomas.prola@uneatlantico.es, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR     <http://repositorio.unib.org/id/eprint/11065/1/Deep_Learning_Approaches_for_Image_Captioning_Opportunities_Challenges_and_Future_Potential.pdf>     (2024) Deep Learning Approaches for Image Captioning: Opportunities, Challenges and Future Potential.  IEEE Access.  p. 1.  ISSN 2169-3536     
document_url: http://repositorio.unib.org/id/eprint/11065/1/Deep_Learning_Approaches_for_Image_Captioning_Opportunities_Challenges_and_Future_Potential.pdf