Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Semantic-enhanced discriminative embedding learning for cross-modal retrieval

Semantic-enhanced discriminative embedding learning for cross-modal retrieval Cross-modal retrieval requires the retrieval from image to text and vice versa. Most existing methods leverage attention mechanism to explore advanced encoding network and utilize the ranking losses to reduce modal gap. Although these methods have achieved remarkable performance, they still suffer from some drawbacks that hinder the model from learning discriminative semantic embeddings. For example, the attention mechanism may assign larger weights to irrelevant parts than relevant parts, which prevents the model from learning discriminative attention distribution. In addition, traditional ranking losses could disregard relatively discriminative information due to the lack of appropriate hardest negative sample mining and information weighting schemes. In this paper, in order to alleviate these issues, a novel semantic-enhanced discriminative embedding learning method is proposed to enhance the discriminative ability of the model, which mainly consists of three modules. The attention-guided erasing module enables the attention model pay more attention to the relevant parts and reduce the interferences of irrelevant parts by erasing non-attention parts. The large-scale negative sampling module leverages momentum-updated memory banks to expand the number of negative samples, which helps increase the probability of hardest negative being sampled. Moreover, the weighted InfoNCE loss module designs a weighted scheme to assign a larger weight to a harder pair. We evaluate the proposed modules by integrating them into three existing cross-modal retrieval models. Extensive experiments demonstrate that integrating each proposed module to the existing models can steadily improve the performance of all models. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Multimedia Information Retrieval Springer Journals

Semantic-enhanced discriminative embedding learning for cross-modal retrieval

Loading next page...
 
/lp/springer-journals/semantic-enhanced-discriminative-embedding-learning-for-cross-modal-uG0BAGOwqw

References (28)

Publisher
Springer Journals
Copyright
Copyright © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2022
ISSN
2192-6611
eISSN
2192-662X
DOI
10.1007/s13735-022-00237-6
Publisher site
See Article on Publisher Site

Abstract

Cross-modal retrieval requires the retrieval from image to text and vice versa. Most existing methods leverage attention mechanism to explore advanced encoding network and utilize the ranking losses to reduce modal gap. Although these methods have achieved remarkable performance, they still suffer from some drawbacks that hinder the model from learning discriminative semantic embeddings. For example, the attention mechanism may assign larger weights to irrelevant parts than relevant parts, which prevents the model from learning discriminative attention distribution. In addition, traditional ranking losses could disregard relatively discriminative information due to the lack of appropriate hardest negative sample mining and information weighting schemes. In this paper, in order to alleviate these issues, a novel semantic-enhanced discriminative embedding learning method is proposed to enhance the discriminative ability of the model, which mainly consists of three modules. The attention-guided erasing module enables the attention model pay more attention to the relevant parts and reduce the interferences of irrelevant parts by erasing non-attention parts. The large-scale negative sampling module leverages momentum-updated memory banks to expand the number of negative samples, which helps increase the probability of hardest negative being sampled. Moreover, the weighted InfoNCE loss module designs a weighted scheme to assign a larger weight to a harder pair. We evaluate the proposed modules by integrating them into three existing cross-modal retrieval models. Extensive experiments demonstrate that integrating each proposed module to the existing models can steadily improve the performance of all models.

Journal

International Journal of Multimedia Information RetrievalSpringer Journals

Published: Sep 1, 2022

Keywords: Cross-modal retrieval; Semantic enhanced; Erasing; Metric learning

There are no references for this article.