Access the full text.
Sign up today, get DeepDyve free for 14 days.
K Weinberger, L Saul (2009)
Distance metric learning for large margin nearest neighbor classificationJMLR, 10
MJ Beal, N Jojic, H Attias (2003)
A graphical model for audiovisual object trackingIEEE PAMI, 25
L Wu (2009)
Scale-invariant visual language modeling for object categorizationIEEE TMM, 11
A Nedungadi (2009)
Analyzing multiple spike trains with nonparametric granger causalityJ Comput Neurosci, 27
P Joly, HK Kim (1996)
Efficient automatic analysis of camera work and microsegmentation of video using spatiotemporal imagesSignal Process: Image Commun, 8
SY Elhabian, KM El-Sayed (2008)
Moving object detection in spatial domain using background removal techniques: state-of-artRecent Patents Comput Sci, 1
L Wu (2010)
Semantics-preserving bag-of-words models and applicationsIEEE TIP, 19
C Granger (1969)
Investigating causal relations by econometric models and cross-spectral methodsEconometrica, 37
C Stauffer, E Grimson (2000)
Learning patterns of activity using realtime trackingIEEE PAMI, 22
A Walden (2000)
A unified view of multitaper multivariate spectral estimationBiometrika, 87
DG Lowe (2004)
Distinctive image features from scale-invariant keypointsIJCV, 60
T Aach, A Kaup (1995)
Bayesian algorithms for adaptive change detection in image sequences using Markov random fieldsSignal Process: Image Commun, 7
I Laptev (2008)
Learning realistic human actions from movies
B Yao, L Fei-Fei (2010)
Grouplet: a structured image representation for recognizing human and object interactions
DT Pham, JF Cardoso (2001)
Blind separation of instantaneous mixtures of non stationary sourcesIEEE Trans Signal Process, 49
M Cristani, B Manuele, M Vittorio (2007)
Audio-visual event recognition in surveillance video sequencesIEEE Trans Multimedia, 9
W Jiang (2010)
Audio-visual atoms for generic video concept classificationACM TOMCCAP, 6
We investigate general concept classification in unconstrained videos by joint audio-visual analysis. An audio-visual grouplet (AVG) representation is proposed based on analyzing the statistical temporal audio-visual interactions. Each AVG contains a set of audio and visual codewords that are grouped together according to their strong temporal correlations in videos, and the AVG carries unique audio-visual cues to represent the video content. By using the entire AVGs as building elements, video concepts can be more robustly classified than using traditional vocabularies with discrete audio or visual codewords. Specifically, we conduct coarse-level foreground/background separation in both audio and visual channels, and discover four types of AVGs by exploring mixed-and-matched temporal audio-visual correlations among the following factors: visual foreground, visual background, audio foreground, and audio background. All of these types of AVGs provide discriminative audio-visual patterns for classifying various semantic concepts. To effectively use the AVGs for improved concept classification, a distance metric learning algorithm is further developed. Based on the AVG structure, the algorithm uses an iterative quadratic programming formulation to learn the optimal distances between data points according to the large-margin nearest-neighbor setting. Various types of grouplet-based distances can be computed using individual AVGs, and through our distance metric learning algorithm these grouplet-based distances can be aggregated for final classification. We extensively evaluate our method over the large-scale Columbia consumer video set. Experiments demonstrate that the AVG-based audio-visual representation can achieve consistent and significant performance improvements compared wth other state-of-the-art approaches.
International Journal of Multimedia Information Retrieval – Springer Journals
Published: Sep 7, 2012
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.