In e-commerce, users often struggle to find products using text-based search due to vague or inconsistent product descriptions. CBIR solves this by allowing users to search using images instead of keywords. The challenge, however, is that images contain low-level features (color, texture, shape) that don’t directly map to high-level product categories (e.g., "running shoes" or "formal dresses"). This difference is called the semantic gap and leads to inaccurate search results.
We used Multi-Task Learning (MTL) to enhance retrieval accuracy. Instead of training a separate model for each task (e.g., color detection, texture recognition, category classification), MTL trains a single model to learn multiple tasks at once.
Adaptive Task Weighting: We assigned dynamic weights to tasks during training, giving more focus to harder-to-learn tasks.
Gradient Surgery: We reduced negative task interference (when tasks conflict) by adjusting how gradients are updated, ensuring tasks learn effectively together.
Efficiency Boost: Our method made training 3x faster, while improving retrieval accuracy by 7.4% compared to existing techniques.