Automatic Image Caption Generation: study and implementation
No Thumbnail Available
Date
2021
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
université Ghardaia
Abstract
Artificial Intelligence (AI) is currently moving increasingly towards multimodal learning which involve build
system that can process information from multiple sources, such as text, images or audio. Image captioning
is one of the main visual-linguistic tasks that requires generating captions to a specific image. The challenge
is to create a unified Deep Learning (DL) model, suitable to describe an image in a correct sentence. To do
so, we need to understand the proper way to visualize the text in a certain space. We used the new term of
Transformer that brings a new concept into a sequence to sequence mechanism, we also include the power
of modern GPU in processing data in an efficient and faster manner. In this path, we have experimented
with a Transformer-based approach and applied it to the image captioning problem using MS COCO dataset.
Description
Keywords
Multimodal Learning, Image captioning, Deep Learning (DL), Transformer, Sequence to sequence, MS-COCO
