A PySpark project that processes and analyzes web server log files to uncover user behavior and system performance insights. The pipeline handles string and date/time data, applies window functions, and uses UDFs to extract meaningful metrics such as most visited pages, hourly/daily traffic patterns, user sessions, error rates, and top IP addresses.