March 15, 2024
Does the Performance of Text-to-Image Retrieval Models Generalize Beyond Captions-as-a-Query?
Text-image retrieval (T2I) refers to the task of recovering all images relevant to a keyword query. Popular datasets for text-image retrieval, such as Flickr30k, VG, or MS-COCO, utilize annotated image captions, e.g., “a man playing with a kid”, as a...