SISAP 2021 - Special Sessions. Semantics-Based Search
Organized by Richard Connor, University of St Andrews, Scotland, UK Alan Dearle, University of St Andrews, Scotland, UK, Lia Morra, Politecnico di Torino, Turin, Italy, Lucia Vadicamo, CNR Pisa, Pisa, Italy
Most work in similarity search takes place in a domain far distant from the actual semantics of a space. Given a metric and a set of values, there is a wealth of research in how to efficiently perform search. However almost without exception, the aspect of mapping the results to the “real” closeness semantic the metric is intended to model is ignored. In part, this is often due to the lack of an unbiased ground truth, which is extremely difficult to establish in large collections. Within this context, large scale near-duplicate detection provides a realistic and challenging task on which different techniques can be compared. On one hand, it entails a subtle differentiation between actual near duplicates, and image pairs which are visually similar, but not semantically related. On the other hand, since the number of image pairs grows quadratically with the size of the collection, it requires both effective and computationally efficient search techniques.
MirFlickr1M is a collection of one million images. Collected for research purposes as a benchmark for image tagging, the original selection of images was guided by certain factors, but no checking for similarity was performed at the time. By chance, however, a large number of near-duplicate clusters do occur, along with a small set of identical image clusters. These similar images exist for “natural” reasons. As an example, some are images of the same highly predictable subject and context (for example the moon); some are alterations of others within the collection after cropping, re-hueing etc., some are subsequent shots taken in quick succession from a single camera, etc.
In the MirFlickr Near Duplicate (MFND) dataset, around 10,000 known similar clusters have been identified and checked by the proposers of this Session. There is strong statistical evidence that this is the large majority of all the near-duplicate images within the set. The remainder of the collection therefore contains over 10 11 visually similar, but not semantically related, image pairs. Hence, differentiating actual near-duplicate images from visually similar images provides a subtle and challenging task.
Purpose of the Session
Papers submitted to this special session must follow the regular paper submission and author guidelines of SISAP 2021 (please check out the submission guidelines). Papers will be submitted in PDF format through EasyChair; please be sure to select “Special Session: Semantics-Based Search” in the appropriate field of the submission form.