? Big Data Engineering Course Project | Real-Time Recommendation System
I’m pleased to share our final project for the Big Data Engineering course, where we designed and implemented a transparent, user-controlled Reddit recommendation system.
The main goal of this project was to provide an alternative to traditional social media algorithms that often operate as black boxes and prioritize engagement over user well-being. Our system gives users full control by allowing them to explicitly define interest weights, without behavioral tracking or hidden machine-learning models.
? Key Highlights:
End-to-end real-time data pipeline using Apache Kafka and Spark Structured Streaming
Deterministic and privacy-first recommendation logic
Trending topic extraction using n-grams and Count-Min Sketch for memory-efficient analytics
Scalable architecture from data ingestion to storage and visualization
Clear separation between streaming, storage, and application layers
? Technologies Used:
Apache Kafka • Apache Spark Structured Streaming • MongoDB • Scala • Node.js • Next.js • Probabilistic Data Structures