Text-image retrieval (T2I) refers to the task of recovering all images relevant to a keyword query. Popular datasets for text-image retrieval, such as Flickr30k, VG, or MS-COCO, utilize annotated image captions, e.g., “a man playing with a kid”, as a surrogate for queries. With such surrogate queries, current multi-modal machine learning models, such as CLIP or BLIP, perform remarkably well.
Articles
Related Articles
May 9, 2024
A Balanced Distributed Cascode Power Amplifier With an Integrated Chebyshev Load Balancer for Full-Duplex Wireless Operation
This work proposes a fully integrated transmitter front end based on a balanced distributed cascode power...
Read More >
1 MIN READING
January 25, 2024
Rotation Invariant Quantization for Model Compression
Post-training Neural Network (NN) model compression is an attractive approach for deploying large, memory-consuming models on...
Read More >
1 MIN READING
October 26, 2017
Trapezoidal block split using orthogonal C2 transforms for HEVC video coding
We present an extension for HEVC intra-frame coding with trapezoidal splits and orthogonal transforms. A block...
Read More >
1 MIN READING