Video concept detection by audio-visual grouplets

Wei Jiang; Alexander Loui

doi:10.1007/s13735-012-0020-6

Loading next page...

References (17)

K Weinberger, L Saul (2009)
Distance metric learning for large margin nearest neighbor classification
JMLR, 10
MJ Beal, N Jojic, H Attias (2003)
A graphical model for audiovisual object tracking
IEEE PAMI, 25
L Wu (2009)
Scale-invariant visual language modeling for object categorization
IEEE TMM, 11
A Nedungadi (2009)
Analyzing multiple spike trains with nonparametric granger causality
J Comput Neurosci, 27
P Joly, HK Kim (1996)
Efficient automatic analysis of camera work and microsegmentation of video using spatiotemporal images
Signal Process: Image Commun, 8
SY Elhabian, KM El-Sayed (2008)
Moving object detection in spatial domain using background removal techniques: state-of-art
Recent Patents Comput Sci, 1
L Wu (2010)
Semantics-preserving bag-of-words models and applications
IEEE TIP, 19
C Granger (1969)
Investigating causal relations by econometric models and cross-spectral methods
Econometrica, 37
C Stauffer, E Grimson (2000)
Learning patterns of activity using realtime tracking
IEEE PAMI, 22
A Walden (2000)
A unified view of multitaper multivariate spectral estimation
Biometrika, 87
DG Lowe (2004)
Distinctive image features from scale-invariant keypoints
IJCV, 60
T Aach, A Kaup (1995)
Bayesian algorithms for adaptive change detection in image sequences using Markov random fields
Signal Process: Image Commun, 7
I Laptev (2008)
Learning realistic human actions from movies
B Yao, L Fei-Fei (2010)
Grouplet: a structured image representation for recognizing human and object interactions
DT Pham, JF Cardoso (2001)
Blind separation of instantaneous mixtures of non stationary sources
IEEE Trans Signal Process, 49
M Cristani, B Manuele, M Vittorio (2007)
Audio-visual event recognition in surveillance video sequences
IEEE Trans Multimedia, 9
W Jiang (2010)
Audio-visual atoms for generic video concept classification
ACM TOMCCAP, 6

Publisher: Springer Journals
Copyright: Copyright © 2012 by Springer-Verlag London Limited
Subject: Computer Science; Information Systems Applications (incl. Internet); Multimedia Information Systems; Computer Science, general; Image Processing and Computer Vision; Information Storage and Retrieval; Data Mining and Knowledge Discovery
ISSN: 2192-6611
eISSN: 2192-662X
DOI: 10.1007/s13735-012-0020-6
Publisher site: See Article on Publisher Site

Abstract

We investigate general concept classification in unconstrained videos by joint audio-visual analysis. An audio-visual grouplet (AVG) representation is proposed based on analyzing the statistical temporal audio-visual interactions. Each AVG contains a set of audio and visual codewords that are grouped together according to their strong temporal correlations in videos, and the AVG carries unique audio-visual cues to represent the video content. By using the entire AVGs as building elements, video concepts can be more robustly classified than using traditional vocabularies with discrete audio or visual codewords. Specifically, we conduct coarse-level foreground/background separation in both audio and visual channels, and discover four types of AVGs by exploring mixed-and-matched temporal audio-visual correlations among the following factors: visual foreground, visual background, audio foreground, and audio background. All of these types of AVGs provide discriminative audio-visual patterns for classifying various semantic concepts. To effectively use the AVGs for improved concept classification, a distance metric learning algorithm is further developed. Based on the AVG structure, the algorithm uses an iterative quadratic programming formulation to learn the optimal distances between data points according to the large-margin nearest-neighbor setting. Various types of grouplet-based distances can be computed using individual AVGs, and through our distance metric learning algorithm these grouplet-based distances can be aggregated for final classification. We extensively evaluate our method over the large-scale Columbia consumer video set. Experiments demonstrate that the AVG-based audio-visual representation can achieve consistent and significant performance improvements compared wth other state-of-the-art approaches.

Journal

International Journal of Multimedia Information Retrieval – Springer Journals

Published: Sep 7, 2012

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Video concept detection by audio-visual grouplets

Video concept detection by audio-visual grouplets

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Video concept detection by audio-visual grouplets

Video concept detection by audio-visual grouplets

References (17)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies