Problem
Large media libraries accumulate exact and near-duplicate images, making cleanup difficult without a scalable similarity workflow.
Approach
- Use embeddings to represent image similarity beyond file names and timestamps.
- Prioritize scalability, accuracy, and practical review workflows.
- Design for iterative validation before destructive cleanup decisions.
Outcome
This is useful supporting evidence for applied ML breadth. It needs repository links, screenshots, and details on scale, thresholds, and evaluation.