Developed a Python application for detecting and removing duplicate images within a dataset. Leveraged image hashing techniques including pHash, dHash, wHash, and average hash (avHash) using the ‘imagehash‘ library and implemented multiprocessing for enhanced efficiency.
– Automated loading and hashing of images using ‘PIL‘ and ‘imagehash‘ libraries.
– Utilized multiprocessing to parallelize hash calculation across CPU cores.
– Detection of duplicate images based on configurable similarity thresholds.
– Removal of identified duplicates from the dataset using OS-level file operations.
اسم المستقل | Abdallah A. |
عدد الإعجابات | 0 |
عدد المشاهدات | 5 |
تاريخ الإضافة |