Transformer-Based English-to-French Translation Model
This project implements a custom Transformer model from scratch to perform machine translation from English to French using TensorFlow and Keras. The entire workflow is contained within a Jupyter Notebook, showcasing data preprocessing, model architecture, training, and inference.
Key Features:
Custom Transformer Implementation: Manually built the encoder-decoder architecture using multi-head attention, positional encoding, and feed-forward layers without relying on high-level APIs.
Dataset Preprocessing: Loaded parallel English-French sentence pairs, applied tokenization using TextVectorization, and padded sequences for consistent input lengths.
Training Pipeline: Trained the model using the Adam optimizer and sparse categorical crossentropy, with masking techniques to handle padding in loss calculation and attention.
Inference Mechanism: Developed a custom translate function that uses greedy decoding to generate French translations from English input sentences.
Performance Insight: Model demonstrates the ability to learn simple translations and aligns closely with the theoretical structure of Vaswani et al.'s "Attention is All You Need."