• Collected an audio dataset from YouTube, resampled audio, normalized chunk amplitudes, and split recordings into
silence-based segments; transcribed segments using Whisper-large-v3.
• Fine-tuned XTTS-v2 on the dataset — achieved 93% accuracy; uploaded model and checkpoints to my HF profile