Image Captioning with CNN + LSTM + Attention Mechanism

تفاصيل العمل

This project implements an image captioning system using a combination of CNN-based image encoder, LSTM-based decoder, and a soft attention mechanism. The model is trained and evaluated on the Flickr8k dataset.

The architecture follows the principles outlined in the research paper, including:

Use of cross-entropy loss

Training with Adam optimizer

Evaluation using standard metrics like BLEU-4, METEOR, and CIDEr

While this notebook does not implement full transformer-based models (like ViT or DeiT), it provides a solid baseline and foundation for further exploration into vision-language tasks.

معاينة

بطاقة العمل

اسم المستقل

Mina N.

عدد الإعجابات

عدد المشاهدات

تاريخ الإضافة

18/08/2025

تاريخ الإنجاز

01/04/2025

المهارات

Image Captioning with CNN + LSTM + Attention Mechanism

تفاصيل العمل

بطاقة العمل

روابط

تابع مستقل على

وسائل الدفع المتاحة