Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Current challenges and visions in music recommender systems research

Current challenges and visions in music recommender systems research Music recommender systems (MRSs) have experienced a boom in recent years, thanks to the emergence and success of online streaming services, which nowadays make available almost all music in the world at the user’s fingertip. While today’s MRSs considerably help users to find interesting music in these huge catalogs, MRS research is still facing substantial challenges. In particular when it comes to build, incorporate, and evaluate recommendation strategies that integrate information beyond simple user–item interactions or content-based descriptors, but dig deep into the very essence of listener needs, preferences, and intentions, MRS research becomes a big endeavor and related publications quite sparse. The purpose of this trends and survey article is twofold. We first identify and shed light on what we believe are the most pressing challenges MRS research is facing, from both academic and industry perspectives. We review the state of the art toward solving these challenges and discuss its limitations. Second, we detail possible future directions and visions we contemplate for the further evolution of the field. The article should therefore serve two purposes: giving the interested reader an overview of current challenges in MRS research and providing guidance for young researchers by identifying interesting, yet under-researched, directions in the field. Keywords Music recommender systems · Challenges · Automatic playlist continuation · User-centric computing 1 Introduction This research was supported in part by the Center for Intelligent Research in music recommender systems (MRSs) has recently Information Retrieval. Any opinions, findings and conclusions or experienced a substantial gain in interest both in academia recommendations expressed in this material are those of the authors and in industry [162]. Thanks to music streaming services and do not necessarily reflect those of the sponsors. like Spotify, Pandora, or Apple Music, music aficionados Markus Schedl B are nowadays given access to tens of millions music pieces. markus.schedl@jku.at By filtering this abundance of music items, thereby limiting Hamed Zamani choice overload [20], MRSs are often very successful to sug- zamani@cs.umass.edu gest songs that fit their users’ preferences. However, such Ching-Wei Chen systems are still far from being perfect and frequently pro- cw@spotify.com duce unsatisfactory recommendations. This is partly because Yashar Deldjoo of the fact that users’ tastes and musical needs are highly yashar.deldjoo@polimi.it dependent on a multitude of factors, which are not considered Mehdi Elahi in sufficient depth in current MRS approaches, which are typ- meelahi@unibz.it ically centered on the core concept of user–item interactions, or sometimes content-based item descriptors. In contrast, we Department of Computational Perception, Johannes Kepler University Linz, Linz, Austria Department of Computer Science, Politecnico di Milano, Center for Intelligent Information Retrieval, Milan, Italy University of Massachusetts Amherst, Amherst, USA Free University of Bozen-Bolzano, Bolzano, Italy Spotify USA Inc., New York, USA 123 96 International Journal of Multimedia Information Retrieval (2018) 7:95–116 argue that satisfying the users’ musical entertainment needs uation, and properly evaluating music recommender systems. requires taking into account intrinsic, extrinsic, and con- We review the state of the art of the respective tasks and its textual aspects of the listeners [2], as well as more decent current limitations. interaction information. For instance, personality and emo- tional state of the listeners (intrinsic) [71,147]aswellas 2.1 Particularities of music recommendation their activity (extrinsic) [75,184] are known to influence musical tastes and needs. So are users’ contextual factors Before we start digging deeper into these challenges, we including weather conditions, social surrounding, or places would first like to highlight the major aspects that make music of interest [2,100]. Also the composition and annotation of a recommendation a particular endeavor and distinguishes it music playlist or a listening session reveal information about from recommending other items, such as movies, books, or which songs go well together or are suited for a certain occa- products. These aspects have been adapted and extended sion [126,194]. Therefore, researchers and designers of MRS from a tutorial on music recommender systems [161], co- should reconsider their users in a holistic way in order to build presented by one of the authors at the ACM Recommender systems tailored to the specificities of each user. 2 Systems 2017 conference. Against this background, in this trends and survey arti- Duration of items In traditional movie recommendation, the cle, we elaborate on what we believe to be among the most items of interest have a typical duration of 90 min or more. In pressing current challenges in MRS research, by discussing book recommendation, the consumption time is commonly the respective state of the art and its restrictions (Sect. 2). Not even much longer. In contrast, the duration of music items being able to touch all challenges exhaustively, we focus on usually ranges between 3 and 5 min (except maybe for clas- cold start, automatic playlist continuation, and evaluation of sical music). Because of this, music items may be considered MRS. While these problems are to some extent prevalent in more disposable. other recommendation domains too, certain characteristics of music pose particular challenges in these contexts. Among Magnitude of items The size of common commercial music them are the short duration of items (compared to movies), catalogs is in the range of tens of millions music pieces, while the high emotional connotation of music, and the acceptance movie streaming services have to deal with much smaller of users for duplicate recommendations. In the second part, catalog sizes, typically thousands up to tens of thousands we present our visions for future directions in MRS research 3 of movies and series. Scalability is therefore a much more (Sect. 3). More precisely, we elaborate on the topics of important issue in music recommendation than in movie rec- psychologically inspired music recommendation (consider- ommendation. ing human personality and emotion), situation-aware music Sequential consumption Unlike movies, music pieces are recommendation, and culture-aware music recommendation. most frequently consumed sequentially, more than one at We conclude this article with a summary and identification of a time, i.e., in a listening session or playlist. This yields a possible starting points for the interested researcher to face number of challenges for a MRS, which relate to identifying the discussed challenges (Sect. 4). the right arrangement of items in a recommendation list. The composition of the authors allows to take academic as well as industrial perspectives, which are both reflected Recommendation of previously recommended items Recom- in this article. Furthermore, we would like to highlight that mending the same music piece again, at a later point in time, particularly the ideas presented as Challenge 2: Automatic may be appreciated by the user of a MRS, in contrast to a playlist continuation in Sect. 2 play an important role in the movie or product recommender, where repeated recommen- task definition, organization, and execution of the ACM Rec- dations are usually not preferred. ommender Systems Challenge 2018 which focuses on this Consumption behavior Music is often consumed passively, use case. This article may therefore also serve as an entry in the background. While this is not a problem per se, it point for potential participants in this challenge. can affect preference elicitation. In particular when using implicit feedback to infer listener preferences, the fact that 2 Grand challenges http://www.cp.jku.at/tutorials/mrs_recsys_2017. In the following, we identify and detail a selection of 3 Spotify reports about 30 million songs in 2017 (https://press.spotify. the grand challenges, which we believe the research field com/at/about); Amazon’s advanced search for books reports 10 mil- lion hardcover and 30 million paperback books in 2017 (https://www. of music recommender systems is currently facing, i.e., amazon.com/Advanced-Search-Books/b?node=241582011), whereas overcoming the cold start problem, automatic playlist contin- Netflix, in contrast, offers about 5,500 movies and TV series as of 2016 (http://time.com/4272360/the-number-of-movies-on-netflix- http://www.recsyschallenge.com/2018. is-dropping-fast). 123 International Journal of Multimedia Information Retrieval (2018) 7:95–116 97 a listener is not paying attention to the music (therefore, is a highly challenging task and usually neglected in cur- e.g., not skipping a song) might be wrongly interpreted as rent MRS, for which reason we discuss emotion-aware MRS a positive signal. as one of the main future directions in MRS research, cf. Sect. 3.1. Listening intent and purpose Music serves various purposes for people and hence shapes their intent to listen to it. This Listening context Situational or contextual aspects [15,48] should be taken into account when building a MRS. In exten- have a strong influence on music preference, consump- sive literature and empirical studies, Schäfer et al. [155] tion, and interaction behavior. For instance, a listener will distilled three fundamental intents of music listening out of likely create a different playlist when preparing for a roman- 129 distinct music uses and functions: self-awareness, social tic dinner than when warming-up with friends to go out relatedness, and arousal and mood regulation. Self-awareness on a Friday night [75]. The most frequently considered is considered as a very private relationship with music listen- types of context include location (e.g., listening at work- ing. The self-awareness dimension “helps people think about place, when commuting, or relaxing at home) [100] and who they are, who they would like to be, and how to cut their time (typically categorized into, for example, morning, after- own path” [154]. Social relatedness [153] describes the use noon, and evening) [31]. Context may, in addition, also of music to feel close to friends and to express identity and relate to the listener’s activity [184], weather [140], or values to others. Mood regulation is concerned with manag- the use of different listening devices, e.g., earplugs on ing emotions, which is a critical issue when it comes to the a smartphone vs. hi-fi stereo at home [75], to name a well-being of humans [77,110,176]. In fact, several studies few. Since music listening is also a highly social activ- found that mood and emotion regulation is the most impor- ity, investigating the social context of the listeners is tant purpose why people listen to music [18,96,122,155], for crucial to understand their listening preferences and behav- which reason we discuss the particular role emotions play ior [45,134]. The importance of considering such con- when listening to music separately below. textual factors in MRS research is acknowledged by dis- cussing situation-aware MRS as a trending research direc- Emotions Music is known to evoke very strong emotions. tion, cf. Sect. 3.2. This is a mutual relationship, though, since also the emo- tions of users affect musical preferences [17,77,144]. Due to this strong relationship between music and emotions, the problem of automatically describing music in terms of emo- tion words is an active research area, commonly refereed 2.2 Challenge 1: Cold start problem to as music emotion recognition (MER), e.g., [14,103,187]. Even though MER can be used to tag music by emotion Problem definition One of the major problems of recom- terms, how to integrate this information into MRS is a highly mender systems in general [64,151], and music recommender complicated task, for three reasons. First, MER approaches systems in particular [99,119]isthe cold start problem, i.e., commonly neglect the distinction between intended emotion when a new user registers to the system or a new item is (i.e., the emotion the composer, songwriter, or performer added to the catalog and the system does not have sufficient had in mind when creating or performing the piece), per- data associated with these items/users. In such a case, the ceived emotion (i.e., the emotion recognized while listening), system cannot properly recommend existing items to a new and induced emotion that is felt by the listener. Second, the user (new user problem) or recommend a new item to the preference for a certain kind of emotionally laden music existing users (new item problem) [3,62,99,164]. piece depends on whether the user wants to enhance or to Another subproblem of cold start is the sparsity problem modulate her mood. Third, emotional changes often occur which refers to the fact that the number of given ratings is within the same music piece, whereas tags are commonly much lower than the number of possible ratings, which is extracted for the whole piece. Matching music and lis- particularly likely when the number of users and items is teners in terms of emotions therefore requires to model large. The inverse of the ratio between given and possible rat- the listener’s musical preference as a time-dependent func- ings is called sparsity. High sparsity translates into low rating tion of their emotional experiences, also considering the coverage, since most users tend to rate only a tiny fraction intended purpose (mood enhancement or regulation). This of items. The effect is that recommendations often become unreliable [99]. Typical values of sparsity are quite close to Please note that the terms “emotion” and “mood” have different mean- 100% in most real-world recommender systems. In the music ings in psychology, whereas they are commonly used as synonyms in domain, this is a particularly substantial problem. Dror et music information retrieval (MIR) and recommender systems research. al. [51], for instance, analyzed the Yahoo! Music dataset, In psychology, in contrast, “emotion” refers to a short-time reaction to which as of time of writing represents the largest music rec- a particular stimulus, whereas “mood” refers to a longer-lasting state without relation to a specific stimulus. ommendation dataset. They report a sparsity of 99.96%. For 123 98 International Journal of Multimedia Information Retrieval (2018) 7:95–116 comparison, the Netflix dataset of movies has a sparsity of is performed in the training and testing phases of the system, “only” 98.82%. and the extracted feature vectors can be used off-the-shelf in the subsequent processing stage; for example, they can be State of the art used to compute similarities between items in a one-to-one A number of approaches have already been proposed to fashion at testing time. In contrast, in (2) first a model is built tackle the cold start problem in the music recommendation from all features extracted in the training phase, whose main domain, foremost content-based approaches, hybridization, role is to map the features into a new (acoustic) space in cross-domain recommendation, and active learning. which the similarities between items are better represented and exploited. An example of approach (1) is the block-level Content-based recommendation (CB) algorithms do not feature framework [167,168], which creates a feature vec- require ratings of users other than the target user. Therefore, tor of about 10,000 dimensions, independently for each song as long as some pieces of information about the user’s own in the given music collection. This vector describes aspects preferences are available, such techniques can be used in cold such as spectral patterns, recurring beats, and correlations start scenarios. Furthermore, in the most severe case, when between frequency bands. An example of strategy (2) is to a new item is added to the catalog, content-based methods create a low-dimensional i-vector representation from the enable recommendations, because they can extract features Mel-frequency cepstral coefficients (MFCCs), which model from the new item and use them to make recommendations. musical timbre to some extent [56]. To this end, a univer- It is noteworthy that while collaborative filtering (CF) sys- sal background model is created from the MFCC vectors of tems have cold start problems both for new users and new the whole music collection, using a Gaussian mixture model items, content-based systems have only cold start problems (GMM). Performing factor analysis on a representation of for new users [5]. the GMM eventually yields i-vectors. As for the new item problem, a standard approach is to In scenarios where some form of semantic labels, e.g., extract a number of features that define the acoustic prop- genres or musical instruments, are available, it is possi- erties of the audio signal and use content-based learning ble to build models that learn the intermediate mapping of the user interest (user profile learning) in order to effect between low-level audio features and semantic representa- recommendations. Feature extraction is typically done auto- tions using machine learning techniques, and subsequently matically, but can also be effected manually by musical use the learned models for prediction. A good point of ref- experts, as in the case of Pandora’s Music Genome Project. erence for such semantic-inferred approaches can be found Pandora uses up to 450 specific descriptors per song, such in [19,36]. as “aggressive female vocalist,” “prominent backup vocals,” An alternative technique to tackle the new item problem “abstract lyrics,” or “use of unusual harmonies.” Regard- is hybridization. A review of different hybrid and ensemble less of whether the feature extraction process is performed recommender systems can be found in [6,26]. In [50], the automatically or manually, this approach is advantageous authors propose a music recommender system which com- not only to address the new item problem but also because bines an acoustic CB and an item-based CF recommender. an accurate feature representation can be highly predicative For the content-based component, it computes acoustic fea- of users’ tastes and interests which can be leveraged in the tures including spectral properties, timbre, rhythm, and pitch. subsequent information filtering stage [5]. An advantage of The content-based component then assists the collaborative music to video is that features in music are limited to a sin- filtering recommender in tackling the cold start problem since gle audio channel, compared to audio and visual channels the features of the former are automatically derived via audio for videos adding a level complexity to the content analysis content analysis. of videos explored individually or multimodal in different The solution proposed in [189] is a hybrid recommender research works [46,47,59,128]. system that combines CF and acoustic CB strategies also by Automatic feature extraction from audio signals can be feature hybridization. However, in this work the feature-level done in two main manners: (1) by extracting a feature vec- hybridization is not performed in the original feature domain. tor from each item individually, independent of other items, Instead, a set of latent variables referred to as conceptual or (2) by considering the cross-relation between items in the genre are introduced, whose role is to provide a common training dataset. The difference is that in (1) the same process shared feature space for the two recommenders and enable hybridization. The weights associated with the latent vari- Note that Dror et al.’s analysis was conducted in 2011. Even though ables reflect the musical taste of the target user and are learned the general character (rating matrices for music items being sparser than during the training stage. those of movie items) remained the same, the actual numbers for today’s catalogs are likely slightly different. In [169], the authors propose a hybrid recommender sys- http://www.pandora.com/about/mgp. tem incorporating item–item CF and acoustic CB based on http://enacademic.com/dic.nsf/enwiki/3224302. similarity metric learning. The proposed metric learning is 123 International Journal of Multimedia Information Retrieval (2018) 7:95–116 99 an optimization model that aims to learn the weights asso- music domain, the authors of [4] provide an interesting liter- ciated with the audio content features (when combined in a ature review of similar user-specific models. linear fashion) so that a degree of consistency between CF- While hybridization can therefore alleviate the cold start based similarity and the acoustic CB similarity measure is problem to a certain extent, as seen in the examples above, established. The optimization problem can be solved using respective approaches are often complex, computationally quadratic programming techniques. expensive, and lack transparency [27]. In particular, results Another solution to cold start is cross-domain recommen- of hybrids employing latent factor models are typically hard dation techniques, which aim at improving recommendations to understand for humans. in one domain (here music) by making use of information A major problem with cross-domain recommender sys- about the user preferences in an auxiliary domain [28,67]. tems is their need for data that connects two or more target Hence, the knowledge of the preferences of the user is domains, e.g., books, movies, and music [29]. In order for transferred from an auxiliary domain to the music domain, such approaches to work properly, items, users, or both there- resulting in a more complete and accurate user model. Sim- fore need to overlap to a certain degree [40]. In the absence ilarly, it is also possible to integrate additional pieces of of such overlap, relationships between the domains must be information about the (new) users, which are not directly established otherwise, e.g., by inferring semantic relation- ships between items in different domains or assuming similar related to music, such as their personality, in order to improve the estimation of the user’s music preferences. Several stud- rating patterns of users in the involved domains. However, ies conducted on user personality characteristics support the whether respective approaches are capable of transferring conjecture that it may be useful to exploit this information in knowledge between domains is disputed [39]. A related issue music recommender systems [69,73,86,130,147]. For a more in cross-domain recommendation is that there is a lack of detailed literature review of cross-domain recommendation, established datasets with clear definitions of domains and we refer to [29,68,102]. recommendation scenarios [102]. Because of this, the major- In addition to the aforementioned approaches, active ity of existing work on cross-domain RS uses some type of learning has shown promising results in dealing with the cold conventional recommendation dataset transformation to suit start problem in single domain [60,146] or cross-domain rec- it for their need. ommendation scenario [136,192]. Active learning addresses Finally, also active learning techniques suffer from a this problem at its origin by identifying and eliciting (high number of issues. First of all, the typical active learning tech- quality) data that can represent the preferences of users bet- niques propose to a user to rate the items that the system ter than by what they provide themselves. Such a system has predicted to be interesting for them, i.e., the items with therefore interactively demands specific user feedback to highest predicted ratings. This indeed is a default strategy in maximize the improvement of system performance. recommender systems for eliciting ratings since users tend to rate what has been recommended to them. Even when users Limitations The state-of-the-art approaches elaborated on browse the item catalog, they are more likely to rate items above are restricted by certain limitations. When using which they like or are interested in, rather than those items content-based filtering, for instance, almost all existing that they dislike or are indifferent to. Indeed, it has been approaches rely on a number of predefined audio features shown that doing so creates a strong bias in the collected that have been used over and over again, including spectral rating data as the database gets populated disproportionately features, MFCCs, and a great number of derivatives [106]. with high ratings. This in turn may substantially influence However, doing so assumes that (all) these features are pre- the prediction algorithm and decrease the recommendation dictive of the user’s music taste, while in practice it has been accuracy [63]. shown that the acoustic properties that are important for the Moreover, not all the active learning strategies are neces- perception of music are highly subjective [132]. Furthermore, sarily personalized. The users differ very much in the amount listeners’ different tastes and levels of interest in different of information they have about the items, their preferences, pieces of music influence perception of item similarity [158]. and the way they make decisions. Hence, it is clearly inef- This subjectiveness demands for CB recommenders that ficient to request all the users to rate the same set of items, incorporate personalization in their mathematical model. For because many users may have a very limited knowledge, example, in [65] the authors propose a hybrid (CB+CF) ignore many items, and will therefore not provide ratings recommender model, namely regression-based latent factor for these items. Properly designed active learning techniques models (RLFM). In [4], the authors propose a user-specific should take this into account and propose different items feature-based similarity model (UFSM), which defines a sim- to different users to rate. This can be highly beneficial and ilarity function for each user, leading to a high degree of increase the chance of acquiring ratings of higher quality personalization. Although not designed specifically for the [57]. 123 100 International Journal of Multimedia Information Retrieval (2018) 7:95–116 Moreover, the traditional interaction model designed for pelling playlists without needing to have extensive musical active learning in recommender systems can support build- familiarity. ing the initial profile of a user mainly in the sign-up process. A large part of the APC task is to accurately infer the This is done by generating a user profile by requesting the intended purpose of a given playlist. This is challenging not user to rate a set of selected items [30]. On the other hand, only because of the broad range of these intended purposes the users must be able to also update their profile by provid- (when they even exist), but also because of the diversity in the ing more ratings anytime they are willing to. This requires underlying features or characteristics that might be needed the system to adopt a conversational interaction model [30], to infer those purposes. e.g., by exploiting novel interactive design elements in the Related to Challenge 1, an extreme cold start scenario for user interface [38], such as explanations that can describe the this task is where a playlist is created with some metadata benefits of providing more ratings and motivating the user to (e.g., the title of a playlist), but no song has been added to the do so. playlist. This problem can be cast as an ad hoc information Finally, it is important to note that in an up-and-running retrieval task, where the task is to rank songs in response to recommender system, the ratings are given by users not a user-provided metadata query. only when requested by the system (active learning) but The APC task can also potentially benefit from user profil- also when a user voluntarily explores the item catalog and ing, e.g., making use of previous playlists and the long-term rates some familiar items (natural acquisition of ratings) listening history of the user. We call this personalized playlist [30,61,63,127,146]. While this could have a huge impact on continuation. the performance of the system, it has been mostly ignored by According to a study carried out in 2016 by the Music the majority of the research works in the field of active learn- Business Association as part of their Music Biz Consumer ing for recommender systems. Indeed, almost all research Insights program, playlists accounted for 31% of music lis- works have been based on a rather non-realistic assumption tening time among listeners in the USA, more than albums that the only source for collecting new ratings is through the (22%), but less than single tracks (46%). Other studies, con- system requests. Therefore, it is crucial to take into account ducted by MIDiA, show that 55% of streaming music a more realistic scenario when studying the active learning service subscribers create music playlists, with some stream- techniques in recommender systems, which can better picture ing services such as Spotify currently hosting over 2 billion 11 12 how the system evolves over time when ratings are provided playlists. In a 2017 study conducted by Nielsen, it was by users [143,146]. found that 58% of users in the USA create their own playlists, 32% share them with others. Studies like these suggest a 2.3 Challenge 2: Automatic playlist continuation growing importance of playlists as a mode of music con- sumption, and as such, the study of APG and APC has never Problem definition In its most generic definition, a playlist been more relevant. is simply a sequence of tracks intended to be listened to State of the art APG has been studied ever since digi- together. The task of automatic playlist generation (APG) tal multimedia transmission made huge catalogs of music then refers to the automated creation of these sequences of available to users. Bonnin and Jannach provide a compre- tracks. In this context, the ordering of songs in a playlist to hensive survey of this field in [21]. In it, the authors frame generate is often highlighted as a characteristics of APG, the APG task as the creation of a sequence of tracks that which is a highly complex endeavor. Some authors have fulfill some “target characteristics” of a playlist, given some therefore proposed approaches based on Markov chains to “background knowledge” of the characteristics of the catalog model the transitions between songs in playlists, e.g., [32, of tracks from which the playlist tracks are drawn. Existing 125]. While these approaches have been shown to outper- APG systems tackle both of these problems in many different form approaches agnostic of the song order in terms of ways. log-likelihood, recent research has found little evidence that the exact order of songs actually matters to users [177], while the ensemble of songs in a playlist [181] and direct song- https://musicbiz.org/news/playlists-overtake-albums-listenership- to-song transitions [92] do matter. says-loop-study. Considered a variation of APG, the task of automatic https://musicbiz.org/resources/tools/music-biz-consumer-insights/ playlist continuation (APC) consists of adding one or more consumer-insights-portal. tracks to a playlist in a way that fits the same target charac- https://www.midiaresearch.com/blog/announcing-midias-state-of- teristics of the original playlist. This has benefits in both the the-streaming-nation-2-report. listening and creation of playlists: users can enjoy listening to https://press.spotify.com/us/about. continuous sessions beyond the end of a finite-length playlist, http://www.nielsen.com/us/en/insights/reports/2017/music-360- while also finding it easier to create longer, more com- 2017-highlights.html. 123 International Journal of Multimedia Information Retrieval (2018) 7:95–116 101 In early approaches [9,10,135]the target characteristics tant criterion for playlist quality. In another recent user of the playlist are specified as multiple explicit constraints, study [177] conducted by Tintarev et al. the authors found that which include musical attributes or metadata such as artist, many participants did not care about the order of tracks in rec- tempo, and style. In others, the target characteristics are a ommended playlists, sometimes they did not even notice that single seed track [121] or a start and an end track [9,32,74]. there is a particular order. However, this study was restricted Other approaches create a circular playlist that comprises to 20 participants who used the Discover Weekly service of all tracks in a given music collection, in such a way that Spotify. consecutive songs are as similar as possible [105,142]. In Another challenge for APC is evaluation: in other words, other works, playlists are created based on the context of the how to assess the quality of a playlist. Evaluation in gen- listener, either as single source [157] or in combination with eral is discussed in more detail in the next section, but there content-based similarity [35,149]. are specific questions around evaluation of playlists that A common approach to build the background knowledge should be pointed out here. As Bonnin and Jannach [21] of the music catalog for playlist generation is using machine put it, the ultimate criterion for this is user satisfaction,but learning techniques to extract that knowledge from manually that is not easy to measure. In [125], McFee and Lanck- curated playlists. The assumption here is that curators of these riet categorize the main approaches to APG evaluation as human evaluation, semantic cohesion, and sequence pre- playlists are encoding rich latent information about which tracks go together to create a satisfying listening experience diction. Human evaluation comes closest to measuring user for an intended purpose. Some proposed APG and APC sys- satisfaction directly, but suffers from problems of scale and tems are trained on playlists from sources such as online reproducibility. Semantic cohesion as a quality metric is radio stations [32,123], online playlist websites [126,181], easily measurable and reproducible, but assumes that users and music streaming services [141]. In the study by Pichl et prefer playlists where tracks are similar along a particu- al. [141], the names of playlists on Spotify were analyzed to lar semantic dimension, which may not always be true, create contextual clusters, which were then used to improve see, for instance, the studies carried out by Slaney and recommendations. White [172] and by Lee [115]. Sequence prediction casts An approach to specifically address song ordering within APC as an information retrieval task, but in the domain of playlists is the use of generative models that are trained on music, an inaccurate prediction needs not be a bad recom- hand-curated playlists. McFee and Lanckriet [125] represent mendation, and this again leads to a potential disconnect songs by metadata, familiarity, and audio content features, between this metric and the ultimate criterion of user sat- adopting ideas from statistical natural language process- isfaction. ing. They train various Markov chains to model transitions Investigating which factors are potentially important for between songs. Similarly, Chen et al. [32] propose a logistic a positive user perception of a playlist, Lee conducted a Markov embedding to model song transitions. This is similar qualitative user study [115], investigating playlists that had to matrix decomposition methods and results in an embed- been automatically created based on content-based similar- ding of songs in Euclidean space. In contrast to McFee and ity. They made several interesting observations. A concern Lanckriet’s model, Chen et al.’s model does not use any audio frequently raised by participants was that of consecutive features. songs being too similar, and a general lack of variety. However, different people had different interpretations of Limitations While some work on automated playlist con- variety, e.g., variety in genres or styles vs. different artists tinuation highlights the special characteristics of playlists, in the playlist. Similarly, different criteria were mentioned i.e., their sequential order, it is not well understood to which when listeners judged the coherence of songs in a playlist, extent and in which cases taking into account the order of including lyrical content, tempo, and mood. When cre- tracks in playlists helps create better models for recommen- ating playlists, participants mentioned that similar lyrics, dation. For instance, in [181] Vall et al. recently demonstrated a common theme (e.g., music to listen to in the train), on two datasets of hand-curated playlists that the song order story (e.g., music for the Independence Day), or era (e.g., seems to be negligible for accurate playlist continuation when rock music from the 1980s) are important and that tracks a lot of popular songs are present. On the other hand, the not complying negatively effect the flow of the playlist. authors argue that order does matter when creating playlists These aspects can be extended by responses of partici- with tracks from the long tail. Another study by McFee and pants in a study conducted by Cunningham et al. [42], Lanckriet [126] also suggests that transition effects play an important role in modeling playlist continuity. This is in line The ranking of criteria (from most to least important) was: homo- with a study presented by Kamehkhosh et al. in [92], in which geneity, artist diversity, transition, popularity, lyrics, order, and fresh- users identified song order as being the second but last impor- ness. https://www.spotify.com/discoverweekly. 123 102 International Journal of Multimedia Information Retrieval (2018) 7:95–116 who further identified the following categories of playlists: 2.4 Challenge 3: Evaluating music recommender same artist, genre, style, or orchestration, playlists for a systems certain event or activity (e.g., party or holiday), romance (e.g., love songs or breakup songs), playlists intended to Problem definition Having its roots in machine learning send a message to their recipient (e.g., protest songs), (cf. rating prediction) and information retrieval (cf. “retriev- and challenges or puzzles (e.g., cover songs liked more ing” items based on implicit “queries” given by user prefer- than the original or songs whose title contains a question ences), the field of recommender systems originally adopted mark). evaluation metrics from these neighboring fields. In fact, Lee also found that personal preferences play a major accuracy and related quantitative measures, such as preci- role. In fact, already a single song that is very much liked sion, recall, or error measures (between predicted and true or hated by a listener can have a strong influence on how ratings), are still the most commonly employed criteria to they judge the entire playlist [115]. This seems particularly judge the recommendation quality of a recommender sys- true if it is a highly disliked song [44]. Furthermore, a good tem [11,78]. In addition, novel measures that are tailored to mix of familiar and unknown songs was often mentioned as the recommendation problem have emerged in recent years. an important requirement for a good playlist. Supporting the These so-called beyond-accuracy measures [98] address discovery of interesting new songs, still contextualized by the particularities of recommender systems and gauge, for familiar ones, increases the likelihood of realizing a serendip- instance, the utility, novelty, or serendipity of an item. How- itous encounter in a playlist [160,193]. Finally, participants ever, a major problem with these kinds of measures is that also reported that their familiarity with a playlist’s genre or they integrate factors that are hard to describe mathemati- theme influenced their judgment of its quality. In general, cally, for instance, the aspect of surprise in case of serendipity listeners were more picky about playlists whose tracks they measures. For this reason, there sometimes exist a variety of were familiar with or they liked a lot. different definitions to quantify the same beyond-accuracy Supported by the studies summarized above, we argue aspect. that the question of what makes a great playlist is highly State of the art In the following, we discuss performance subjective and further depends on the intent of the creator measures which are most frequently reported when evalu- or listener. Important criteria when creating or judging a ating recommender systems. An overview of these is given playlist include track similarity/coherence, variety/diversity, in Table 1. They can be roughly categorized into accuracy- but also the user’s personal preferences and familiarity with related measures, such as prediction error (e.g., MAE and the tracks, as well as the intention of the playlist cre- RMSE) or standard IR measures (e.g., precision and recall), ator. Unfortunately, current automatic approaches to playlist and beyond-accuracy measures, such as diversity, novelty, continuation are agnostic of the underlying psychologi- and serendipity. Furthermore, while some of the metrics cal and sociological factors that influence the decision of quantify the ability of recommender systems to find good which songs users choose to include in a playlist. Since items, e.g., precision and recall, others consider the ranking knowing about such factors is vital to understand the of items and therefore assess the system’s ability to position intent of the playlist creator, we believe that algorithmic good recommendations at the top of the recommendation list, methods for APC need to holistically learn such aspects e.g., MAP, NDCG, or MPR. from manually created playlists and integrate respective intent models. However, we are aware that in today’s era Mean absolute error (MAE) is one of the most common met- rics for evaluating the prediction power of recommender where billions of playlists are shared by users of online streaming services, a large-scale analysis of psycholog- algorithms. It computes the average absolute deviation between the predicted ratings and the actual ratings provided ical and sociological background factors is impossible. Nevertheless, in the absence of explicit information about by users [81]. Indeed, MAE indicates how close the rating predictions generated by an MRS are to the real user ratings. user intent, a possible starting point to create intent mod- els might be the metadata associated with user-generated MAE is computed as follows: playlists, such as title or description. To foster this kind of research, the playlists provided in the dataset for the ACM Recommender Systems Challenge 2018 include playlist MAE = |r −ˆr | (1) u,i u,i |T | titles. r ∈T u,i where r and rˆ , respectively, denote the actual and the u,i u,i https://press.spotify.com/us/about. predicted ratings of item i for user u. MAE sums over the https://recsys-challenge.spotify.com. absolute prediction errors for all ratings in a test set T . 123 International Journal of Multimedia Information Retrieval (2018) 7:95–116 103 Table 1 Evaluation measures Measure Abbreviation Type Ranking-aware commonly used for recommender systems Mean absolute error MAE Error/accuracy No Root-mean-square error RMSE Error/accuracy No Precision at top K recommendations P@K Accuracy No Recall at top K recommendations R@K Accuracy No Mean average precision at top K recommendations MAP@K Accuracy Yes Normalized discounted cumulative gain NDCG Accuracy Yes Half-life utility HLU Accuracy Yes Mean percentile rank MPR Accuracy Yes Spread – Beyond No Coverage – Beyond No Novelty – Beyond No Serendipity – Beyond No Diversity – Beyond No u. The overall P@K is then computed by averaging P @K values for all users in the test set. Root-mean-square error (RMSE) is another similar metric that is computed as: Mean average precision at top K recommendations (MAP@K) is a rank-based metric that computes the overall precision of the system at different lengths of recommendation lists. MAP is computed as the arithmetic mean of the average precision RMSE =  (r −ˆr ) . (2) u,i u,i |T | over the entire set of users in the test set. Average precision r ∈T u,i for the top K recommendations (AP@K ) is defined as fol- lows: It is an extension to MAE in that the error term is squared, which penalizes larger differences between predicted and AP@K = P@i · rel(i ) (4) true ratings more than smaller ones. This is motivated by the i =1 assumption that, for instance, a rating prediction of 1 when the true rating is 4 is much more severe than a prediction of th where rel(i ) is an indicator signaling if the i recommended 3 for the same item. item is relevant, i.e., rel(i ) = 1, or not, i.e., rel(i ) = 0; N is the total number of relevant items. Note that MAP implic- Precision at top K recommendations (P@K) is a common itly incorporates recall, because it also considers the relevant metric that measures the accuracy of the system in command- items not in the recommendation list. ing relevant items. In order to compute P@K , for each user, the top K recommended items whose ratings also appear Recall at top K recommendations (R@K) is presented here in the test set T are considered. This metric was originally for the sake of completeness, even though it is not a crucial designed for binary relevance judgments. Therefore, in case measure from a consumer’s perspective. Indeed, the listener of availability of relevance information at different levels, is typically not interested in being recommended all or a large such as a five-point Likert scale, the labels should be bina- number of relevant items, rather in having good recommen- rized, e.g., considering the ratings greater than or equal to 4 dations at the top of the recommendation list. For a user u, (out of 5) as relevant. For each user u, P @K is computed R @K is defined as: as follows: We should note that in the recommender systems community, another variation of average precision is gaining popularity recently, formally 1 K |L ∩ L | u u defined by: AP@K = P@i · rel(k) in which N is the i =1 min(K ,N ) P @K = (3) total number of relevant items and K is the size of recommendation |L | list. The motivation behind the minimization term is to prevent the AP scores to be unfairly suppressed when the number of recommendations is too low to capture all the relevant items. This variation of MAP was where L is the set of relevant items for user u in the test popularized by Kaggle competitions [97] about recommender systems set T and L denotes the recommended set containing the and has been used in several other research works, consider for exam- K items in T with the highest predicted ratings for the user ple [8,124]. 123 104 International Journal of Multimedia Information Retrieval (2018) 7:95–116 |L ∩ L | u u max (r − d, 0) u,i R @K = (5) u HLU = (8) |L | (rank −1)/(h−1) u,i i =1 where L is the set of relevant items of user u in the test set where r and rank denote the rating and the rank of item T and L denotes the recommended set containing the K u,i u,i i for user u, respectively, in the recommendation list of length items in T with the highest predicted ratings for the user u. N; d represents a default rating (e.g., average rating); and h The overall R@K is calculated by averaging R @K values is the half-time, calculated as the rank of a music item in the for all the users in the test set. list, such that the user can eventually listen to it with a 50% Normalized discounted cumulative gain (NDCG)isamea- chance. HLU can be further normalized by the maximum sure for the ranking quality of the recommendations. This utility (similar to NDCG), and the final HLU is the average metric has originally been proposed to evaluate the effec- over the half-time utilities obtained for all users in the test set. tiveness of information retrieval systems [93]. It is nowadays A larger HLU may correspond to a superior recommendation also frequently used for evaluating music recommender sys- performance. tems [120,139,185]. Assuming that the recommendations for Mean percentile rank (MPR) estimates the users’ satisfaction user u are sorted according to the predicted rating values in with items in the recommendation list and is computed as the descending order. DCG is defined as follows: average of the percentile rank for each test item within the ranked list of recommended items for each user [89]. The u,i percentile rank of an item is the percentage of items whose DCG = (6) log (i + 1) position in the recommendation list is equal to or lower than i =1 the position of the item itself. Formally, the percentile rank PR for user u is defined as: where r is the true rating (as found in test set T ) for the item u,i ranked at position i for user u, and N is the length of the rec- ommendation list. Since the rating distribution depends on r · rank u,i u,i r ∈T u,i PR =  (9) the users’ behavior, the DCG values for different users are u,i r ∈T u,i not directly comparable. Therefore, the cumulative gain for each user should be normalized. This is done by computing the ideal DCG for user u, denoted as IDCG , which is the where r is the true rating (as found in test set T ) for item u,i DCG value for the best possible ranking, obtained by order- i ratedbyuser u and rank is the percentile rank of item i u,i ing the items by true ratings in descending order. Normalized within the ordered list of recommendations for user u.MPRis discounted cumulative gain for user u is then calculated as: then the arithmetic mean of the individual PR values over all users. A randomly ordered recommendation list has an DCG u expected MPR value of 50%. A smaller MPR value is there- NDCG = . (7) IDCG fore assumed to correspond to a superior recommendation performance. Finally, the overall normalized discounted cumulative gain Spread is a metric of how well the recommender algorithm NDCG is computed by averaging NDCG over the entire can spread its attention across a larger set of items [104]. In set of users. more detail, spread is the entropy of the distribution of the items recommended to the users in the test set. It is formally In the following, we present common quantitative eval- defined as: uation metrics, which have been particularly designed or adopted to assess recommender systems performance, even though some of them have their origin in information retrieval spread =− P(i ) log P(i ) (10) and machine learning. The first two (HLU and MRR) still i ∈I belong to the category of accuracy-related measures, while the subsequent ones capture beyond-accuracy aspects where I represents the entirety of items in the dataset Half-life utility (HLU) measures the utility of a recommenda- and P(i ) = count (i )/  count (i ), such that count (i ) i ∈I tion list for a user with the assumption that the likelihood of denotes the total number of times that a given item i showed viewing/choosing a recommended item by the user exponen- up in the recommendation lists. It may be infeasible to expect tially decays with the item’s position in the ranking [24,137]. an algorithm to achieve the perfect spread (i.e., recommend- Formally written, HLU for user u is defined as: ing each item an equal number of times) without avoiding 123 International Journal of Multimedia Information Retrieval (2018) 7:95–116 105 irrelevant recommendations or unfulfillable rating requests. of them, nor to be able to understand or rate them. Therefore, Accordingly, moderate spread values are usually preferable. moderate values indicate better performances [104]. Coverage of a recommender system is defined as the propor- Serendipity aims at evaluating MRS based on the relevant and tion of items over which the system is capable of generating surprising recommendations. While the need for serendip- recommendations [81]: ity is commonly agreed upon [82], the question of how to measure the degree of serendipity for a recommendation list is controversial. This particularly holds for the question of |T | whether the factor of surprise implies that items must be coverage = (11) novel to the user [98]. On a general level, serendipity of a |T | recommendation list L provided to a user u can be defined as: where |T | is the size of the test set and |T | is the number of unexp usef ul L ∩ L ratings in T for which the system can predict a value. This is u u serendipity(L ) = (13) particularly important in cold start situations, when recom- |L | mender systems are not able to accurately predict the ratings unexp usef ul of new users or new items and hence obtain low coverage. where L and L denote subsets of L that contain, u u Recommender systems with lower coverage are therefore respectively, recommendations unexpected to and useful for limited in the number of items they can recommend. A sim- the user. The usefulness of an item is commonly assessed by ple remedy to improve low coverage is to implement some explicitly asking users or taking user ratings as proxy [98]. default recommendation strategy for an unknown user–item The unexpectedness of an item is typically quantified by entry. For example, we can consider the average rating of some measure of distance from expected items, i.e., items that users for an item as an estimate of its rating. This may come are similar to the items already rated by the user. In the context at the price of accuracy, and therefore, the trade-off between of MRS, Zhang et al. [193] propose an “unserendipity” mea- coverage and accuracy needs to be considered in the evalua- sure that is defined as the average similarity between the items tion process [7]. in the user’s listening history and the new recommendations. Similarity between two items in this case is calculated by Novelty measures the ability of a recommender system to an adapted cosine measure that integrates co-liking informa- recommend new items that the user did not know about tion, i.e., number of users who like both items. It is assumed before [1]. A recommendation list may be accurate, but if that lower values correspond to more surprising recommen- it contains a lot of items that are not novel to a user, it is not dations, since lower values indicate that recommendations necessarily a useful list [193]. deviate from the user’s traditional behavior [193]. While novelty should be defined on an individual user level, considering the actual freshness of the recommended Diversity is another beyond-accuracy measure as already dis- items, it is common to use the self-information of the recom- cussed in the limitations part of Challenge 1. It gauges the mended items relative to their global popularity: extent to which recommended items are different from each other, where difference can relate to various aspects, e.g., musical style, artist, lyrics, or instrumentation, just to name 1 − log pop a few. Similar to serendipity, diversity can be defined in sev- novelty = (12) |U | N u∈Ui ∈L eral ways. One of the most common is to compute pairwise distance between all items in the recommendation set, either averaged [196]orsummed[173]. In the former case, the where pop is the popularity of item i measured as percent- diversity of a recommendation list L is calculated as follows: age of users who rated i, L is the recommendation list of the top N recommendations for user u [193,195]. The above dist i , j i ∈L j ∈L\i di versity(L) = (14) definition assumes that the likelihood of the user selecting a |L|· (|L|− 1) previously unknown item is proportional to its global popu- where dist is the some distance function defined between larity and is used as an approximation of novelty. In order to i , j items i and j. Common choices are inverse cosine similar- obtain more accurate information about novelty or freshness, ity [150], inverse Pearson correlation [183], or Hamming explicit user feedback is needed, in particular since the user distance [101]. might have listened to an item through other channels before. It is often assumed that the users prefer recommendation lists with more novel items. However, if the presented items When it comes to the task of evaluating playlist recom- are too novel, then the user is unlikely to have any knowledge mendation, where the goal is to assess the capability of the 123 106 International Journal of Multimedia Information Retrieval (2018) 7:95–116 recommender in providing proper transitions between subse- Addressing both objective and subjective evaluation cri- quent songs, the conventional error or accuracy metrics may teria, Knijnenburg et al. [108] propose a holistic frame- not be able to capture this property. There is hence a need for work for user-centric evaluation of recommender systems. sequence-aware evaluation measures. For example, consider Figure 1 provides an overview of the components. The objec- the scenario where a user who likes both classical and rock tive system aspects (OSAs) are considered unbiased factors music is recommended a rock music right after she has lis- of the RS, including aspects of the user interface, comput- tened to a classic piece. Even though both music styles are in ing time of the algorithm, or number of items shown to the agreement with her taste, the transition between songs plays user. They are typically easy to specify or compute. The an important role toward user satisfaction. In such a situation, OSAs influence the subjective system aspects (SSAs), which given a currently played song and in the presence of several are caused by momentary, primary evaluative feelings while equally likely good options to be played next, a RS may interacting with the system [80]. This results in a different be inclined to rank songs based on their popularity. Hence, perception of the system by different users. SSAs are there- other metrics such as average log-likelihood have been pro- fore highly individual aspects and typically assessed by user posed to better model the transitions [33,34]. In this regard, questionnaires. Examples of SSA include general appeal of when the goal is to suggest a sequence of items, alternative the system, usability, and perceived recommendation diver- multi-metric evaluation approaches are required to take into sity or novelty. The aspect of experience (EXP) describes consideration multiple quality factors. Such evaluation met- the user’s attitude toward the system and is commonly also rics can consider the ranking order of the recommendations investigated by questionnaires. It addresses the user’s per- or the internal coherence or diversity of the recommended ception of the interaction with the system. The experience is list as a whole. In many scenarios, adoption of such quality highly influenced by the other components, which means metrics can lead to a trade-off with accuracy which should changing any of the other components likely results in a be balanced by the RS algorithm [145]. change of EXP aspects. Experience can be broken down into the evaluation of the system, the decision process, and Limitations As of today, the vast majority of evaluation the final decisions made, i.e., the outcome. The interaction approaches in recommender systems research focus on quan- (INT) aspects describe the observable behavior of the user, titative measures, either accuracy-like or beyond-accuracy, time spent viewing an item, as well as clicking or purchas- which are often computed in offline studies. ing behavior. In a music context, examples further include Doing so has the advantage of facilitating the reproducibil- liking a song or adding it to a playlist. Therefore, interac- ity of evaluation results. However, limiting the evaluation to tions aspects belong to the objective measures and are usually quantitative measures means to forgo another important fac- determined via logging by the system. Finally, Knijnenburg tor, which is user experience. In other words, in the absence et al.’s framework mentions personal characteristics (PC) of user-centric evaluations, it is difficult to extend the claims and situational characteristics (SC), which influence the user to the more important objective of the recommender system experience. PC include aspects that do not exist without the under evaluation, i.e., giving users a pleasant and useful per- user, such as user demographics, knowledge, or perceived sonalized experience [107]. control, while SC include aspects of the interaction context, Despite acknowledging the need for more user-centric such as when and where the system is used, or situation- evaluation strategies [158], the factor human, user, or, in specific trust or privacy concerns. Knijnenburg et al. [108] the case of MRS, listener is still way too often neglected also propose a questionnaire to asses the factors defined in or not properly addressed. For instance, while there exist their framework, for instance, perceived recommendation quantitative objective measures for serendipity and diver- quality, perceived system effectiveness, perceived recom- sity, as discussed above, perceived serendipity and diversity mendation variety, choice satisfaction, intention to provide can be highly different from the measured ones [182]as feedback, general trust in technology, and system-specific they are subjective user-specific concepts. This illustrates privacy concern. that even beyond-accuracy measures cannot fully capture the While this framework is a generic one, tailoring it to MRS real user satisfaction with a recommender system. On the would allow for user-centric evaluation thereof. In partic- other hand, approaches that address user experience (UX) ular, the aspects of personal and situational characteristics can be investigated to evaluate recommender systems. For should be adapted to the particularities of music listeners example, a MRS can be evaluated based on user engage- and listening situations, respectively, cf. Sect. 2.1.Tothis ment, which provides a restricted explanation of UX that end, researchers in MRS should consider the aspects relevant concentrates on judgment of product quality during inter- to the perception and preference of music, and their implica- action [79,118,133]. User satisfaction, user engagement, tions on MRS, which have been identified in several studies, and more generally user experience are commonly assessed e.g., [43,113,114,158,159]. In addition to the general ones through user studies [13,116,117]. mentioned by Knijnenburg et al., of great importance in the 123 International Journal of Multimedia Information Retrieval (2018) 7:95–116 107 music domain seem to be psychological factors, including has been shown that personality can influence the human affect and personality, social influence, musical training and decision-making process as well as the tastes and interests. experience, and physiological condition. Due to this direct relation, people with similar personality We believe that carefully and holistically evaluating MRS factors are very likely to share similar interests and tastes. by means of accuracy and beyond-accuracy, objective and Earlier studies conducted on the user personality char- subjective measures, in offline and online experiments, would acteristics support the potential benefits that personality lead to a better understanding of the listeners’ needs and information could have in recommender systems [22,23,58, requirements vis-à-vis MRS, and eventually a considerable 85,87,178,180]. As a known example, psychological stud- improvement of current MRS. ies [147] have shown that extravert people are likely to prefer the upbeat and conventional music. Accordingly, a personality-based MRS could use this information to bet- 3 Future directions and visions ter predict which songs are more likely than others to please extravert people [86]. Another example of poten- While the challenges identified in the previous section are tial usage is to exploit personality information in order to already researched on intensely, in the following, we provide compute similarity among users and hence identify the like- a more forward-looking analysis and discuss some MRS- minded users [178]. This similarity information could then be related trending topics, which we assume influential for the integrated into a neighborhood-based collaborative filtering next generation of MRS. All of them have in common that approach. their aim is to create more personalized recommendations. In order to use personality information in a recommender More precisely, we first outline how psychological constructs system, the system first has to elicit this information from such as personality and emotion could be integrated into the users, which can be done either explicitly or implicitly. MRS. Subsequently, we address situation-aware MRS and In the former case, the system can ask the user to com- argue for the need of multifaceted user models that describe plete a personality questionnaire using one of the personality contextual and situational preferences. To round off, we evaluation inventories, e.g., the ten- item personality inven- discuss the influence of users’ cultural background on recom- tory [76] or the big five inventory [94]. In the latter case, mendation preferences, which needs to be considered when the system can learn the personality by tracking and observ- building culture-aware MRS. ing users’ behavioral patterns, for instance, liking behavior on Facebook [111] or applying filters to images posted on Instagram [170]. Not too surprisingly, it has shown that sys- 3.1 Psychologically inspired music recommendation tems that explicitly elicit personality characteristics achieve Personality and emotion are important psychological con- superior recommendation outcomes, e.g., in terms of user sat- structs. While personality characteristics of humans are a isfaction, ease of use, and prediction accuracy [52]. On the predictable and stable measure that shapes human behaviors, downside, however, many users are not willing to fill in long emotions are short-term affective responses to a particular questionnaires before being able to use the RS. A way to alle- stimulus [179]. Both have been shown to influence music viate this problem is to ask users only the most informative tastes [71,154,159] and user requirements for MRS [69,73]. questions of a personality instrument [163]. Which questions However, in the context of (music) recommender systems, are most informative, though, first needs to be determined based on existing user data and is dependent on the recom- personality and emotion do not play a major role yet. Given the strong evidence that both influence listening prefer- mendation domain. Other studies showed that users are to some extent willing to provide further information in return ences [147,159] and the recent emergence of approaches to accurately predict them from user-generated data [111,170], for a better quality of recommendations [175]. we believe that psychologically inspired MRS is an upcom- Personality information can be used in various ways, ing area. particularly, to generate recommendations when traditional rating or consumption data is missing. Otherwise, the person- 3.1.1 Personality ality traits can be seen as an additional feature that extends the user profile, that can be used mainly to identify similar users In psychology research, personality is often defined as a “con- in neighborhood-based recommender systems or directly fed sistent behavior pattern and interpersonal processes originat- into extended matrix factorization models [67]. ing within the individual” [25]. This definition accounts for the individual differences in people’s emotional, interper- 3.1.2 Emotion sonal, experiential, attitudinal, and motivational styles [95]. Several prior works have studied the relation of decision The emotional state of the MRS user has a strong impact on making and personality factors. In [147], as an example, it his or her short-time musical preferences [99]. Vice versa, 123 108 International Journal of Multimedia Information Retrieval (2018) 7:95–116 Fig. 1 Evaluation framework of the user experience for recommender systems, according to [108] music has a strong influence on our emotional state. It there- presented one of the various categorical models (emotions fore does not come as a surprise that emotion regulation are described by distinct emotion words such as happiness, was identified as one of the main reasons why people lis- sadness, anger, or fear) [84,191]or dimensional models tentomusic [122,155]. As an example, people may listen to (emotions are described by scores with respect to two or completely different musical genres or styles when they are three dimensions, e.g., valence and arousal) [152]. For a sad in comparison with when they are happy. Indeed, prior more detailed elaboration on emotion models in the context research on music psychology discovered that people may of music, we refer to [159,186]. The implicit acquisition of choose the type of music which moderates their emotional emotional states can be effected, for instance, by analyzing condition [109]. More recent findings show that music can be user-generated text [49], speech [66], or facial expressions mainly chosen so as to augment the emotional situation per- in video [55]. ceived by the listener [131]. In order to build emotion-aware Emotion tagging in music The music piece itself can be MRS, it is therefore necessary to (i) infer the emotional state regarded as an emotion-laden content and in turn can be the listener is in, (ii) infer emotional concepts from the music described by emotion words. The task of automatically itself, and (iii) understand how these two interrelate. These assigning such emotion words to a music piece is an active three tasks are detailed below. research area, often refereed to as music emotion recogni- Eliciting the emotional state of the listener Similar to per- tion (MER), e.g., [14,91,103,187,188,191]. How to integrate sonality traits, the emotional state of a user can be elicited such emotion terms created by MER tools into a MRS is, explicitly or implicitly. In the former case, the user is typically however, not an easy task, for several reasons. First, early 123 International Journal of Multimedia Information Retrieval (2018) 7:95–116 109 MER approaches usually neglected the distinction between for instance, the music preference of a user would differ in intended emotion, perceived emotion, and induced or felt libraries and in gyms [35]. Therefore, considering location as emotion, cf. Sect. 2.1. Current MER approaches focus on a situation-specific signal could lead to substantial improve- perceived or induced emotions. However, musical content ments in the recommendation performance. Time of the day still contains various characteristics that affect the emotional is another situational signal that could be used for recom- state of the listener, such as lyrics, rhythm, and harmony, and mendation; for instance, the music a user would like to listen the way how they affect the emotional state is highly subjec- to in mornings differs from those in nights [41]. One situa- tive. This so even though research has detected a few general tional signal of particular importance in the music domain is rules, for instance, a musical piece that is in major key is social context since music tastes and consumption behaviors typically perceived brighter and happier than those in minor are deeply rooted in the users’ social identities and mutually key, or a piece in rapid tempo is perceived more exciting or affect each other [45,134]. For instance, it is very likely that more tense than slow tempo ones [112]. a user would prefer different music when being alone than when meeting friends. Such social factors should therefore Connecting listener emotions and music emotion tags Cur- be considered when building situation-aware MRS. Other rent emotion-based MRSs typically consider emotional situational signals that are sometimes exploited include the scores as contextual factors that characterize the situation user’s current activity [184], the weather [140], the user’s the user is experiencing. Hence, the recommender systems mood [129], and the day of the week [83]. Regarding time, exploit emotions in order to pre-filter the preferences of users there is also another factor to consider, which is that most or post-filter the generated recommendations. Unfortunately, music that was considered trendy years ago is now consid- this neglects the psychological background, in particular ered old. This implies that ratings for the same song or artist on the subjective and complex interrelationships between might strongly differ, not only between users, but in general expressed, perceived, and induced emotions [159], which as a function of time. To incorporate such aspects in MRS, it is of special importance in the music domain as music is would be crucial to record a timestamp for all ratings. known to evoke stronger emotions than, for instance, prod- It is worth noting that situational features have been ucts [161]. It has also been shown that personality influences proven to be strong signals in improving retrieval perfor- in which emotional state which kind of emotionally laden mance in search engines [16,190]. Therefore, we believe music is preferred by listeners [71]. Therefore, even if auto- that researching and building situation-aware music recom- mated MER approaches would be able to accurately predict mender systems should be one central topic in MRS research. the perceived or induced emotion of a given music piece, in While several situation-aware MRSs already exist, e.g., the absence of deep psychological listener profiles, match- [12,35,90,100,157,184], they commonly exploit only one or ing emotion annotations of items and listeners may not yield very few such situational signals, or are restricted to a cer- satisfying recommendations. This is so because how people tain usage context, e.g., music consumption in a car or in a judge music and which kind of music they prefer depends tourist scenario. Those systems that try to take a more com- to a large extent on their current psychological and cogni- prehensive view and consider a variety of different signals, tive states. We hence believe that the field of MRS should on the other hand, suffer from a low number of data instances embrace psychological theories, elicit the respective user- or users, rendering it very hard to build accurate context specific traits, and integrate them into recommender systems, models [75]. What is still missing, in our opinion, are (com- in order to build decent emotion-aware MRS. mercial) systems that integrate a variety of situational signals on a very large scale in order to truly understand the listen- 3.2 Situation-aware music recommendation ers needs and intents in any given situation and recommend music accordingly. While we are aware that data availability Most of the existing music recommender systems make and privacy concerns counteract the realization of such sys- recommendations solely based on a set of user-specific tems on a large commercial scale, we believe that MRS will and item-specific signals. However, in real-world scenarios, eventually integrate decent multifaceted user models inferred many other signals are available. These additional signals from contextual and situational factors. can be further used to improve the recommendation perfor- mance. A large subset of these additional signals includes 3.3 Culture-aware music recommendation situational signals. In more detail, the music preference of a user depends on the situation at the moment of recom- While most humans share an inclination to listen to music, mendation. Location is an example of situational signals; independent on their location or cultural background, the way music is performed, perceived, and interpreted evolves in a culture-specific manner. However, research in MRS seems Please note that music taste is a relatively stable characteristic, while music preferences vary depending on the context and listening intent. to be agnostic of this fact. In music information retrieval 123 110 International Journal of Multimedia Information Retrieval (2018) 7:95–116 (MIR) research, on the other hand, cultural aspects have should be defined on various levels though, not only coun- been studied to some extent in recent years, after preceding try borders. Other examples include having a joint historical (and still ongoing) criticisms of the predominance of Western background, speaking the same language, sharing the same music in this community. Arguably the most comprehensive beliefs or religion, and differences between urban vs. rural culture-specific research in this domain has been conducted cultures. Another aspect that relates to culture is a temporal as part of the CompMusic project, in which five non- one since certain cultural trends, e.g., what defines the “youth Western music traditions have been analyzed in detail in order culture,” are highly dynamic in a temporal and geographical to advance automatic description of music by emphasizing sense. We believe that MRS which are aware of such cross- cultural specificity. The analyzed music traditions included cultural differences and similarities in music perception and Indian Hindustani and Carnatic [53], Turkish Makam [54], taste, and are able to recommend music a listener in the same Arab-Andalusian [174], and Beijing Opera [148]. However, or another culture may like, would substantially benefit both the project’s focus was on music creation, content analy- users and providers of MRS. sis, and ethnomusicological aspects rather than on the music consumption side [37,165,166]. Recently, analyzing content- based audio features describing rhythm, timbre, harmony, and melody for a corpus of a larger variety of world and folk 4 Conclusions music with given country information, Panteli et al. found distinct acoustic patterns of the music created in individual In this trends and survey paper, we identified several grand countries [138]. They also identified geographical and cul- challenges the research field of music recommender systems tural proximities that are reflected in music features, looking (MRS) is facing. These are, among others, in the focus of at outliers and misclassifications in a classification experi- current research in the area of MRS. We discussed (1) the ments using country as target class. For instance, Vietnamese cold start problem of items and users, with its particularities music was often confused with Chinese and Japanese, South in the music domain, (2) the challenge of automatic playlist African with Botswanese. continuation, which is gaining importance due to the recently In contrast to this—meanwhile quite extensive—work on emerged user request of being recommended musical expe- culture-specific analysis of music traditions, little effort has riences rather than single tracks [161], and (3) the challenge been made to analyze cultural differences and patterns of of holistically evaluating music recommender systems, in music consumption behavior, which is, as we believe, a cru- particular, capturing aspects beyond accuracy. cial step to build culture-aware MRS. The few studies investi- In addition to the grand challenges, which are currently gating such cultural differences include [88], in which Hu and highly researched, we also presented a visionary outlook of Lee found differences in perception of moods between Amer- what we believe to be the most interesting future research ican and Chinese listeners. By analyzing the music listening directions in MRS. In particular, we discussed (1) psycholog- behavior of users from 49 countries, Ferwerda et al. found ically inspired MRS, which consider in the recommendation relationships between music listening diversity and Hofst- process factors such as listeners’ emotion and personality, ede’s cultural dimensions [70,72]. Skowron et al. used the (2) situation-aware MRS, which holistically model contex- same dimensions to predict genre preferences of listeners tual and environmental aspects of the music consumption with different cultural backgrounds [171]. Schedl analyzed a process, infer listener needs and intents, and eventually inte- large corpus of listening histories created by Last.fm users in grate these models at large scale in the recommendation 47 countries and identified distinct preference patterns [156]. process, and (3) culture-aware MRS, which exploit the fact Further analyses revealed countries closest to what can be that music taste highly depends on the cultural background of considered the global mainstream (e.g., the Netherlands, UK, the listener, where culture can be defined in manifold ways, and Belgium) and countries farthest from it (e.g., China, Iran, including historical, political, linguistic, or religious similar- and Slovakia). However, all of these works define culture in ities. terms of country borders, which often makes sense, but is We hope that this article helped pinpointing major chal- sometimes also problematic, for instance, in countries with lenges, highlighting recent trends, and identifying interest- large minorities of inhabitants with different cultures. ing research questions in the area of music recommender In our opinion, when building MRS, the analysis of cul- systems. Believing that research addressing the discussed tural patterns of music consumption behavior, subsequent challenges and trends will pave the way for the next genera- creation of respective cultural listener models, and their inte- tion of music recommender systems, we are looking forward gration into recommender systems are vital steps to improve to exciting, innovative approaches and systems that improve personalization and serendipity of recommendations. Culture user satisfaction and experience, rather than just accuracy measures. http://compmusic.upf.edu. 123 International Journal of Multimedia Information Retrieval (2018) 7:95–116 111 Acknowledgements Open access funding provided by Johannes Kepler 14. Barthet M, Fazekas G, Sandler M (2012) Multidisciplinary per- University Linz. We would like to thank all researchers in the fields of spectives on music emotion recognition: Implications for content recommender systems, information retrieval, music research, and mul- and context-based models. In: Proceedings of international sym- timedia, with whom we had the pleasure to discuss and collaborate posium on computer music modelling and retrieval, pp 492–507 in recent years, and whom in turn influenced and helped shaping this 15. Bauer C, Novotny A (2017) A consolidated view of context for article. Special thanks go to Peter Knees and Fabien Gouyon for the intelligent systems. J Ambient Intell Smart Environ 9(4):377–393. fruitful discussions while preparing the ACM Recommender Systems https://doi.org/10.3233/ais-170445 2017 tutorial on music recommender systems. In addition, we would like 16. Bennett PN, Radlinski F, White RW, Yilmaz E (2011) Inferring to thank the reviewers of our manuscript, who provided useful and con- and using location metadata to personalize web search. In: Pro- structive comments to improve the original draft and turn it into what it ceedings of the 34th international ACM SIGIR conference on is now. We would also like to thank Eelco Wiechert for providing addi- research and development in information retrieval, SIGIR’11. tional pointers to relevant literature. Furthermore, the many personal ACM, New York, NY, USA, pp 135–144. https://doi.org/10.1145/ discussions with actual users of MRS unveiled important shortcomings 2009916.2009938 of current approaches and in turn were considered in this article. 17. Bodner E, Iancu I, Gilboa A, Sarel A, Mazor A, Amir D (2007) Finding words for emotions: the reactions of patients with major Open Access This article is distributed under the terms of the Creative depressive disorder towards various musical excerpts. Arts Psy- Commons Attribution 4.0 International License (http://creativecomm chother 34(2):142–150 ons.org/licenses/by/4.0/), which permits unrestricted use, distribution, 18. Boer D, Fischer R (2010) Towards a holistic model of functions of and reproduction in any medium, provided you give appropriate credit music listening across cultures: a culturally decentred qualitative to the original author(s) and the source, provide a link to the Creative approach. Psychol Music 40(2):179–200 Commons license, and indicate if changes were made. 19. Bogdanov D, Haro M, Fuhrmann F, Xambó A, Gómez E, Herrera P (2013) Semantic audio content-based music recommendation and visualization based on user preference examples. Inf Process Manag 49(1):13–33 References 20. Bollen D, Knijnenburg BP, Willemsen MC, Graus M (2010) Understanding choice overload in recommender systems. In: Pro- 1. Adamopoulos P, Tuzhilin A (2015) On unexpectedness in recom- ceedings of the 4th ACM conference on recommender systems, mender systems: or how to better expect the unexpected. ACM Barcelona, Spain Trans Intell Syst Technol 5(4):54 21. Bonnin G, Jannach D (2015) Automated generation of music 2. Adomavicius G, Mobasher B, Ricci F, Tuzhilin A (2011) Context- playlists: survey and experiments. ACM Comput Surv 47(2):26 aware recommender systems. AI Mag 32:67–80 22. Braunhofer M, Elahi M, Ricci F (2014) Techniques for cold- 3. Adomavicius G, Tuzhilin A (2005) Toward the next generation of starting context-aware mobile recommender systems for tourism. Intelli Artif 8(2):129–143. https://doi.org/10.3233/IA-140069 recommender systems: a survey of the state-of-the-art and pos- 23. Braunhofer M, Elahi M, Ricci F (2015) User personality and the sible extensions. IEEE Trans Knowl Data Eng 17(6):734–749. new user problem in a context-aware point of interest recom- https://doi.org/10.1109/TKDE.2005.99 mender system. In: Tussyadiah I, Inversini A (eds) Information 4. Agarwal D, Chen BC (2009) Regression-based latent factor mod- and communication technologies in tourism 2015. Springer, els. In: Proceedings of the 15th ACM SIGKDD international Cham, pp 537–549 conference on knowledge discovery and data mining. ACM, pp 19–28 24. Breese JS, Heckerman D, Kadie C (1998) Empirical analysis of 5. Aggarwal CC (2016) Content-based recommender systems. In: predictive algorithms for collaborative filtering. In: Proceedings Recommender systems. Springer, pp 139–166 of the 14th conference on uncertainty in artificial intelligence. 6. Aggarwal CC (2016) Ensemble-based and hybrid recommender Morgan Kaufmann Publishers Inc., pp 43–52 systems. In: Recommender systems. Springer, pp 199–224 25. Burger JM (2010) Personality. Wadsworth Publishing, Belmont 7. Aggarwal CC (2016) Evaluating recommender systems. In: Rec- 26. Burke R (2002) Hybrid recommender systems: survey and exper- ommender systems. Springer, pp 225–254 iments. User Model User-Adap Interact 12(4):331–370 8. Aiolli F (2013) Efficient top-n recommendation for very large 27. Burke R (2007) Hybrid web recommender systems. Springer scale binary rated datasets. In: Proceedings of the 7th ACM con- Berlin Heidelberg, Berlin, pp 377–408. https://doi.org/10.1007/ ference on recommender systems. ACM, pp. 273–280 978-3-540-72079-9_12 9. Alghoniemy M, Tewfik A (2001) A network flow model for 28. Cantador I, Cremonesi P (2014) Tutorial on cross-domain recom- playlist generation. In: Proceedings of the IEEE international con- mender systems. In: Proceedings of the 8th ACM conference on ference on multimedia and expo (ICME), Tokyo, Japan recommender systems, RecSys’14. ACM, New York, NY, USA, 10. Alghoniemy M, Tewfik AH (2000) User-defined music sequence pp 401–402. https://doi.org/10.1145/2645710.2645777 retrieval. In: Proceedings of the eighth ACM international confer- 29. Cantador I, Fernández-Tobías I, Berkovsky S, Cremonesi P (2015) ence on multimedia, pp 356–358. ACM Cross-domain recommender systems. Springer, Boston, pp 919– 11. Baeza-Yates R, Ribeiro-Neto B (2011) Modern information 959. https://doi.org/10.1007/978-1-4899-7637-6_27 retrieval—the concepts and technology behind search, 2nd edn. 30. Carenini G, Smith J, Poole D (2003) Towards more conversational Addison-Wesley, Pearson and collaborative recommender systems. In: Proceedings of the 12. Baltrunas L, Kaminskas M, Ludwig B, Moling O, Ricci F, Lüke 8th international conference on intelligent user interfaces, IUI’03. KH, Schwaiger R (2011) InCarMusic: Context-Aware Music Rec- ACM, New York, NY, USA, pp. 12–18. https://doi.org/10.1145/ 604045.604052 ommendations in a Car. In: International conference on electronic 31. Cebrián T, Planagumà M, Villegas P, Amatriain X (2010) Music commerce and web technologies (EC-Web), Toulouse, France recommendations with temporal context awareness. In: Pro- 13. Barrington L, Oda R, Lanckriet GRG. Smarter than genius? ceedings of the 4th ACM conference on recommender systems Human evaluation of music recommender systems. In: Proceed- (RecSys), Barcelona, Spain ings of the 10th international society for music information retrieval conference, ISMIR 2009, Kobe International Conference 32. Chen S, Moore JL, Turnbull D, Joachims T (2012) Playlist pre- Center, Kobe, Japan, 26–30 October 2009, pp 357–362 diction via metric embedding. In: Proceedings of the 18th ACM 123 112 International Journal of Multimedia Information Retrieval (2018) 7:95–116 SIGKDD international conference on knowledge discovery and 49. Dey L, Asad MU, Afroz N, Nath RPD (2014) Emotion extraction data mining, KDD’12. ACM, New York, NY, USA, pp 714–722. from real time chat messenger. In: 2014 International conference https://doi.org/10.1145/2339530.2339643 on informatics, electronics vision (ICIEV), pp 1–5. https://doi. 33. Chen S, Moore JL, Turnbull D, Joachims T (2012) Playlist pre- org/10.1109/ICIEV.2014.6850785 diction via metric embedding. In: Proceedings of the 18th ACM 50. Donaldson J (2007) A hybrid social-acoustic recommendation SIGKDD international conference on knowledge discovery and system for popular music. In: Proceedings of the ACM conference data mining. ACM, pp 714–722 on recommender systems (RecSys), Minneapolis, MN, USA 34. Chen S, Xu J, Joachims T (2013) Multi-space probabilistic 51. Dror G, Koenigstein N, Koren Y, Weimer M (2011) The yahoo! sequence modeling. In: Proceedings of the 19th ACM SIGKDD music dataset and kdd-cup’11. In: Proceedings of the 2011 international conference on knowledge discovery and data min- international conference on KDD Cup 2011, vol 18, pp 3–18. ing. ACM, pp 865–873 JMLR.org 35. Cheng Z, Shen J (2014) Just-for-me: an adaptive personalization 52. Dunn G, Wiersema J, Ham J, Aroyo L (2009) Evaluating interface system for location-aware social music recommendation. In: Pro- variants on personality acquisition for recommender systems. In: ceedings of the 4th ACM international conference on multimedia Proceedings of the 17th international conference on user mod- retrieval (ICMR), Glasgow, UK eling, adaptation, and Personalization: formerly UM and AH, 36. Cheng Z, Shen J (2016) On effective location-aware music rec- UMAP’09. Springer, Berlin, Heidelberg, pp 259–270 ommendation. ACM Trans Inf Syst 34(2):13 53. Dutta S, Murthy HA (2014) Discovering typical motifs of a raga 37. Cornelis O, Six J, Holzapfel A, Leman M (2013) Evaluation from one-liners of songs in carnatic music. In: Proceedings of the and recommendation of pulse and tempo annotation in ethnic 15th international society for music information retrieval confer- music. J New Music Res 42(2):131–149. https://doi.org/10.1080/ ence (ISMIR), Taipei, Taiwan, pp 397–402 09298215.2013.812123 54. Dzhambazov G, Srinivasamurthy A, Sentürk ¸ S, Serra X (2016) 38. Cremonesi P, Elahi M, Garzotto F (2017) User interface patterns in On the use of note onsets for improved lyrics-to-audio alignment recommendation-empowered content intensive multimedia appli- in turkish makam music. In: 17th International society for music cations. Multimed Tools Appl 76(4):5275–5309. https://doi.org/ information retrieval conference (ISMIR 2016), New York, USA 10.1007/s11042-016-3946-5 55. Ebrahimi Kahou S, Michalski V, Konda K, Memisevic R, Pal 39. Cremonesi P, Quadrana M (2014) Cross-domain recommenda- C (2015) Recurrent neural networks for emotion recognition in tions without overlapping data: Myth or reality? In: Proceedings video. In: Proceedings of the 2015 ACM on international confer- of the 8th ACM conference on recommender systems, RecSys’14. ence on multimodal interaction, ICMI’15. ACM, New York, NY, ACM, New York, NY, USA, pp. 297–300. https://doi.org/10. USA, pp 467–474. https://doi.org/10.1145/2818346.2830596 1145/2645710.2645769 56. Eghbal-zadeh H, Lehner B, Schedl M, Widmer G (2015) I-Vectors 40. Cremonesi P, Tripodi A, Turrin R (2011) Cross-domain rec- for timbre-based music similarity and music artist classification. ommender systems. In: IEEE 11th international conference on In: Proceedings of the 16th international society for music infor- data mining workshops, pp 496–503. https://doi.org/10.1109/ mation retrieval conference (ISMIR), Malaga, Spain ICDMW.2011.57 57. Elahi M (2011) Adaptive active learning in recommender systems. 41. Cunningham S, Caulder S, Grout V (2008) Saturday night or User Model Adapt Pers 414–417 fever? Context-aware music playlists. In: Proceedings of the 3rd 58. Elahi M, Braunhofer M, Ricci F, Tkalcic M (2013) Personality- international audio mostly conference: sound in motion, Piteå, based active learning for collaborative filtering recommender sys- Sweden tems. In: AI* IA 2013: advances in artificial intelligence. Springer, 42. Cunningham SJ, Bainbridge D, Falconer A (2006) ‘More of an art pp 360–371. https://doi.org/10.1007/978-3-319-03524-6_31 than a science’: supporting the creation of playlists and mixes. In: 59. Elahi M, Deldjoo Y, Bakhshandegan Moghaddam F, Cella L, Proceedings of the 7th international conference on music infor- Cereda S, Cremonesi P (2017) Exploring the semantic gap for mation retrieval (ISMIR), Victoria, BC, Canada movie recommendations. In: Proceedings of the eleventh ACM 43. Cunningham SJ, Bainbridge D, Mckay D (2007) Finding new conference on recommender systems. ACM, pp 326–330 music: a diary study of everyday encounters with novel songs. In: 60. Elahi M, Repsys V, Ricci F (2011) Rating elicitation strategies for Proceedings of the 8th international conference on music infor- collaborative filtering. In: Huemer C, Setzer T (eds) EC-Web, Lec- mation retrieval, Vienna, Austria, pp 83–88 ture Notes in Business Information Processing, vol 85. Springer, 44. Cunningham SJ, Downie JS, Bainbridge D (2005) “The Pain, The pp 160–171. https://doi.org/10.1007/978-3-642-23014-1_14 Pain”: modelling music information behavior and the songs we 61. Elahi M, Ricci F, Rubens N (2012) Adapting to natural rat- hate. In: Proceedings of the 6th international conference on music ing acquisition with combined active learning strategies. In: information retrieval (ISMIR 2005), London, UK, pp 474–477 ISMIS’12: Proceedings of the 20th international conference on 45. Cunningham SJ, Nichols DM (2009) Exploring social music foundations of intelligent systems. Springer, Berlin, Heidelberg, behaviour: an investigation of music selection at parties. In: Pro- pp 254–263 ceedings of the 10th international society for music information 62. Elahi M, Ricci F, Rubens N (2014) Active learning in collabora- retrieval conference (ISMIR 2009), Kobe, Japan tive filtering recommender systems. In: Hepp M, Hoffner Y (eds) 46. Deldjoo Y, Cremonesi P, Schedl M, Quadrana M (2017) The E-commerce and web technologies, Lecture Notes in Business effect of different video summarization models on the quality Information Processing, vol 188. Springer, pp 113–124. https:// of video recommendation based on low-level visual features. In: doi.org/10.1007/978-3-319-10491-1_12 Proceedings of the 15th international workshop on content-based 63. Elahi M, Ricci F, Rubens N (2014) Active learning strategies for multimedia indexing. ACM, p. 20 rating elicitation in collaborative filtering: a system-wide perspec- 47. Deldjoo Y, Elahi M, Cremonesi P, Garzotto F, Piazzolla P, Quad- tive. ACM Trans Intell Syst Technol 5(1):13:1–13:33. https://doi. rana M (2016) Content-based video recommendation system org/10.1145/2542182.2542195 based on stylistic visual features. J Data Semant. https://doi.org/ 64. Elahi M, Ricci F, Rubens N (2016) A survey of active learning 10.1007/s13740-016-0060-9 in collaborative filtering recommender systems. Comput Sci Rev 48. Dey AK (2001) Understanding and using context. Pers Ubiquitous 20:29–50 Comput 5(1):4–7. https://doi.org/10.1007/s007790170019 123 International Journal of Multimedia Information Retrieval (2018) 7:95–116 113 65. Elbadrawy A, Karypis G (2015) User-specific feature-based simi- ing. In: Proceedings of the ACM conference on recommender larity models for top-n recommendation of new items. ACM Trans systems: workshop on music recommendation and discovery Intell Syst Technol 6(3):33 (WOMRAD 2010), pp 7–10 66. Erdal M, Kächele M, Schwenker F (2016) Emotion recognition 84. Hevner K (1935) Expression in music: a discussion of experimen- in speech with deep learning architectures. Springer, Cham, pp tal studies and theories. Psychol Rev 42:186–204 298–311. https://doi.org/10.1007/978-3-319-46182-3_25 85. Hu R, Pu P (2009) A comparative user study on rating vs. person- 67. Fernandez Tobias I, Braunhofer M, Elahi M, Ricci F, Ivan C ality quiz based preference elicitation methods. In: Proceedings (2016) Alleviating the new user problem in collaborative filter- of the 14th international conference on Intelligent user interfaces, ing by exploiting personality information. User Model User-Adap IUI’09. ACM, New York, NY, USA, pp 367–372. https://doi.org/ Interact (Personality in Personalized Systems). https://doi.org/10. 10.1145/1502650.1502702 1007/s11257-016-9172-z 86. Hu R, Pu P (2010) A study on user perception of personality- 68. Fernández-Tobías I, Cantador I, Kaminskas M, Ricci F (2012) based recommender systems. In: Bra PD, Kobsa A, Chin DN (eds) Cross-domain recommender systems: a survey of the state of the UMAP, Lecture Notes in Computer Science, vol 6075. Springer, art. In: Spanish conference on information retrieval, p 24 pp 291–302 69. Ferwerda B, Graus M, Vall A, Tkalci ˇ cˇ M, Schedl M (2016) The 87. Hu R, Pu P (2011) Enhancing collaborative filtering systems with influence of users’ personality traits on satisfaction and attrac- personality information. In: Proceedings of the fifth ACM confer- tiveness of diversified recommendation lists. In: Proceedings of ence on recommender systems, RecSys’11. ACM, New York, NY, the 4th workshop on emotions and personality in personalized USA, pp 197–204. https://doi.org/10.1145/2043932.2043969 services (EMPIRE 2016), Boston, USA 88. Hu X, Lee JH (2012) A cross-cultural study of music mood per- 70. Ferwerda B, Schedl M (2016) Investigating the relationship ception between American and Chinese listeners. In: Proceedings between diversity in music consumption behavior and cultural of the ISMIR dimensions: a cross-country analysis. In: Workshop on surprise, 89. Hu Y, Koren Y, Volinsky C (2008) Collaborative filtering for opposition, and obstruction in adaptive and personalized systems implicit feedback datasets. In: Proceedings of the 8th IEEE inter- 71. Ferwerda B, Schedl M, Tkalci ˇ cˇ M (2015) Personality & emotional national conference on data mining. IEEE, pp. 263–272 states: understanding users music listening needs. In: Extended 90. Hu Y, Ogihara M (2011) NextOne player: a music recommenda- proceedings of the 23rd international conference on user model- tion system based on user behavior. In: Proceedings of the 12th ing, adaptation and personalization (UMAP), Dublin, Ireland international society for music information retrieval conference 72. Ferwerda B, Vall A, Tkalci ˇ cˇ M, Schedl M (2016) Exploring music (ISMIR 2011), Miami, FL, USA diversity needs across countries. In: Proceedings of the UMAP 91. Huq A, Bello J, Rowe R (2010) Automated music emotion recog- 73. Ferwerda B, Yang E, Schedl M, Tkalci ˇ cˇ M (2015) Personal- nition: a systematic evaluation. J New Music Res 39(3):227–244 ity traits predict music taxonomy preferences. In: ACM CHI’15 92. Iman Kamehkhosh Dietmar Jannach GB (2018) How automated extended abstracts on human factors in computing systems, Seoul, recommendations affect the playlist creation behavior of users. In: Republic of Korea Joint proceedings of the 23rd ACM conference on intelligent user 74. Flexer A, Schnitzer D, Gasser M, Widmer G (2008) Playlist gen- interfaces (ACM IUI 2018) workshops: intelligent music inter- eration using start and end songs. In: Proceedings of the 9th faces for listening and creation (MILC), Tokyo, Japan international conference on music information retrieval (ISMIR), 93. Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation Philadelphia, PA, USA of ir techniques. ACM Trans Inf Syst 20(4):422–446. https://doi. 75. Gillhofer M, Schedl M (2015) Iron maiden while jogging, debussy org/10.1145/582415.582418 for dinner? An analysis of music listening behavior in context. In: 94. John O, Srivastava S (1999) The big five trait taxonomy: history, Proceedings of the 21st international conference on multimedia measurement, and theoretical perspectives. In: Pervin LA, John modeling (MMM), Sydney, Australia OP (eds) Handbook of personality: theory and research, 510, 2nd 76. Gosling SD, Rentfrow PJ, Swann WB Jr (2003) A very brief edn. Guilford Press, New York, pp 102–138 measure of the big-five personality domains. J Res Personal 95. John OP, Srivastava S (1999) The big five trait taxonomy: his- 37(6):504–528 tory, measurement, and theoretical perspectives. In: Handbook of 77. Gross J (2007) Emotion regulation: conceptual and empirical personality: theory and research, vol 2, pp. 102–138 foundations. In: Gross J (ed) Handbook of emotion regulation, 96. Juslin PN, Sloboda J (2011) Handbook of music and emotion: 2nd edn. The Guilford Press, New York, pp 1–19 theory, research, applications. OUP, Oxford 78. Gunawardana A, Shani G (2015) Evaluating recommender sys- 97. Kaggle Official Homepage. https://www.kaggle.com. Accessed tems. In: Ricci F, Rokach L, Shapira B, Kantor PB (eds) 11 March 2018 Recommender systems handbook, chap. 8, 2nd edn. Springer, 98. Kaminskas M, Bridge D (2016) Diversity, serendipity, novelty, Heidelberg, pp 256–308 and coverage: a survey and empirical analysis of beyond-accuracy 79. Hart J, Sutcliffe AG, di Angeli A (2012) Evaluating user engage- objectives in recommender systems. ACM Trans Interact Intell ment theory. In: CHI conference on human factors in computing Syst 7(1):2:1–2:42. https://doi.org/10.1145/2926720 systems. Paper presented in workshop ’Theories behind UX 99. Kaminskas M, Ricci F (2012) Contextual music information Research and How They Are Used in Practice’ 6 May 2012 retrieval and recommendation: state of the art and challenges. 80. Hassenzahl M (2005) The thing and I: understanding the relation- Comput Sci Rev 6(2):89–119 ship between user and product. Springer, Dordrecht, pp 31–42. 100. Kaminskas M, Ricci F, Schedl M (2013) Location-aware music https://doi.org/10.1007/1-4020-2967-5_4 recommendation using auto-tagging and hybrid matching. In: Pro- 81. Herlocker JL, Konstan JA, Terveen LG, Riedl JT (2004) Evaluat- ceedings of the 7th ACM conference on recommender systems ing collaborative filtering recommender systems. ACM Trans Inf (RecSys), Hong Kong, China Syst 22(1):5–53. https://doi.org/10.1145/963770.963772 101. Kelly JP, Bridge D (2006) Enhancing the diversity of conversa- 82. Herlocker JL, Konstan JA, Terveen LG, Riedl JT (2004) Evaluat- tional collaborative recommendations: a comparison. Artif Intell ing collaborative filtering recommender systems. ACM Trans Inf Rev 25(1):79–95. https://doi.org/10.1007/s10462-007-9023-8 Syst 22(1):5–53 102. Khan MM, Ibrahim R, Ghani I (2017) Cross domain recom- 83. Herrera P, Resa Z, Sordo M (2010) Rocking around the clock eight mender systems: a systematic literature review. ACM Comput days a week: an exploration of temporal patterns of music listen- Surv 50(3):36 123 114 International Journal of Multimedia Information Retrieval (2018) 7:95–116 103. Kim YE, Schmidt EM, Migneco R, Morton BG, Richardson P, 121. Logan B (2002) Content-based playlist generation: exploratory Scott J, Speck J, Turnbull D (2010) Music emotion recognition: a experiments. In: Proceedings of the 3rd international symposium state of the art review. In: Proceedings of the international society on music information retrieval (ISMIR), Paris, France for music information retrieval conference 122. Lonsdale AJ, North AC (2011) Why do we listen to music? A 104. Kluver D, Konstan JA (2014) Evaluating recommender behavior uses and gratifications analysis. Br J Psychol 102(1):108–134 for new users. In: Proceedings of the 8th ACM conference on rec- 123. Maillet F, Eck D, Desjardins G, Lamere P et al (2009) Steerable ommender systems. ACM, pp 121–128. https://doi.org/10.1145/ playlist generation by learning song similarity from radio station 2645710.2645742 playlists. In: ISMIR, pp 345–350 105. Knees P, Pohle T, Schedl M, Widmer G (2006) Combining 124. McFee B, Bertin-Mahieux T, Ellis DP, Lanckriet GR (2012) The audio-based similarity with web-based data to accelerate auto- million song dataset challenge. In: Proceedings of the 21st inter- matic music playlist generation. In: Proceedings of the 8th national conference on world wide web. ACM, pp 909–916 ACM SIGMM international workshop on multimedia informa- 125. McFee B, Lanckriet G (2011) The natural language of playlists. tion retrieval (MIR), Santa Barbara, CA, USA In: Proceedings of the 12th international society for music infor- 106. Knees P, Schedl M (2016) Music similarity and retrieval: an mation retrieval conference (ISMIR 2011), Miami, FL, USA introduction to audio- and web-based strategies. The information 126. McFee B, Lanckriet G (2012) Hypergraph models of playlist retrieval series. Springer Berlin Heidelberg. https://books.google. dialects. In: Proceedings of the 13th international society for it/books?id=MdRhjwEACAAJ music information retrieval conference (ISMIR), Porto, Portugal 107. Knijnenburg BP, Willemsen MC (2015) Evaluating recommender 127. McNee SM, Lam SK, Konstan JA, Riedl J (2003) Interfaces for systems with user experiments. In: Recommender systems hand- eliciting new user preferences in recommender systems. In: Pro- book. Springer, pp 309–352 ceedings of the 9th international conference on user modeling, 108. Knijnenburg BP, Willemsen MC, Gantner Z, Soncu H, Newell C UM’03. Springer, Berlin, Heidelberg, pp. 178–187. http://dl.acm. (2012) Explaining the user experience of recommender systems. org/citation.cfm?id=1759957.1759988 User Model User-Adapt Interact 22(4–5):441–504 128. Mei T, Yang B, Hua XS, Li S (2011) Contextual video recommen- 109. Konecni VJ (1982) Social interaction and musical preference. In: dation by multimodal relevance and user feedback. ACM Trans The psychology of music, pp 497–516 Inf Syst 29(2):10 110. Koole SL (2009) The psychology of emotion regulation: an inte- 129. North A, Hargreaves D (1996) Situational influences on reported grative review. Cogn Emot 23:4–41 musical preference. Psychomusicol Music Mind Brain 15(1– 111. Kosinski M, Stillwell D, Graepel T (2013) Private traits and 2):30–45 attributes are predictable from digital records of human behav- 130. North A, Hargreaves D (2008) The social and applied psychology ior. Proc Natl Acad Sci 110(15):5802–5805 of music. Oxford University Press, Oxford 112. Kuo FF, Chiang MF, Shan MK, Lee SY (2005) Emotion-based 131. North AC, Hargreaves DJ (1996) Situational influences on music recommendation by association discovery from film music. reported musical preference. Psychomusicology A J Res Music In: Proceedings of the 13th annual ACM international conference Cogn 15(1–2):30 on multimedia. ACM, pp 507–510 132. Novello A, McKinney MF, Kohlrausch A (2006) Perceptual Eval- 113. Laplante A (2014) Improving music recommender systems: What uation of Music Similarity. In: Proceedings of the 7th international we can learn from research on music tastes? In: 15th International conference on music information retrieval (ISMIR), Victoria, BC, society for music information retrieval conference, Taipei, Taiwan Canada 114. Laplante A, Downie JS (2006) Everyday life music information- 133. O’Brien HL, Toms EG (2010) The development and evaluation of seeking behaviour of young adults. In: Proceedings of the 7th a survey to measure user engagement. J Am Soc Inf Sci Technol international conference on music information retrieval, Victoria 61(1):50–69. https://doi.org/10.1002/asi.v61:1 (BC), Canada 134. O’Hara K, Brown B (eds) (2006) Consuming music together: 115. Lee JH (2011) How similar is too similar? Exploring users’ per- social and collaborative aspects of music consumption technolo- ceptions of similarity in playlist evaluation. In: Proceedings of the gies, computer supported cooperative work, vol 35. Springer, 12th international society for music information retrieval confer- Dordrecht ence (ISMIR 2011), Miami, FL, USA 135. Pachet F, Roy P, Cazaly D (1999) A combinatorial approach to 116. Lee JH, Cho H, Kim YS (2016) Users’ music information needs content-based music selection. In: IEEE international conference and behaviors: design implications for music information retrieval on multimedia computing and systems, 1999, vol 1. IEEE, pp systems. J Assoc Inf Sci Technol 67(6):1301–1330 457–462 117. Lee JH, Wishkoski R, Aase L, Meas P, Hubbles C (2017) 136. Pagano R, Quadrana M, Elahi M, Cremonesi P (2017) Toward Understanding users of cloud music services: selection factors, active learning in cross-domain recommender systems. CoRR. management and access behavior, and perceptions. J Assoc Inf arXiv:1701.02021 Sci Technol 68(5):1186–1200 137. Pan R, Zhou Y, Cao B, Liu NN, Lukose R, Scholz M, Yang Q 118. Lehmann J, Lalmas M, Yom-Tov E, Dupret G (2012) Models (2008) One-class collaborative filtering. In: Proceedings of the of user engagement. In: Proceedings of the 20th international 8th IEEE international conference on data mining. IEEE, pp 502– conference on user modeling, adaptation, and personalization, 511 UMAP’12. Springer, Berlin, Heidelberg, pp 164–175. https://doi. 138. Panteli M, Benetos E, Dixon S (2016) Learning a feature space for org/10.1007/978-3-642-31454-4_14 similarity in world music. In: Proceedings of the 17th international 119. Li Q, Myaeng SH, Guan DH, Kim BM (2005) A probabilistic society for music information retrieval conference (ISMIR 2016), model for music recommendation considering audio features. In: New York, NY, USA Asia information retrieval symposium. Springer, pp 72–83 139. Park ST, Chu W (2009) Pairwise preference regression for cold- 120. Liu NN, Yang Q (2008) Eigenrank: a ranking-oriented approach start recommendation. In: Proceedings of the third ACM confer- to collaborative filtering. In: SIGIR’08: proceedings of the 31st ence on recommender systems, RecSys’09. ACM, New York, NY, annual international ACM SIGIR conference on research and USA, pp 21–28. https://doi.org/10.1145/1639714.1639720 development in information retrieval. ACM, New York, NY, USA, 140. Pettijohn T, Williams G, Carter T (2010) Music for the sea- pp 83–90. https://doi.org/10.1145/1390334.1390351 sons: seasonal music preferences in college students. Curr Psychol 29(4):328–345 123 International Journal of Multimedia Information Retrieval (2018) 7:95–116 115 141. Pichl M, Zangerle E, Specht G (2015) Towards a context-aware tra music. IEEE Trans Affect Comput. https://doi.org/10.1109/ music recommendation approach: what is hidden in the playlist TAFFC.2017.2663421 name? In: 2015 IEEE international conference on data mining 160. Schedl M, Hauger D, Schnitzer D (2012) A model for serendip- workshop (ICDMW). IEEE, pp 1360–1365 itous music retrieval. In: Proceedings of the 2nd workshop on 142. Pohle T, Knees P, Schedl M, Pampalk E, Widmer G (2007) “Rein- context-awareness in retrieval and recommendation (CaRR), Lis- venting the Wheel”: a novel approach to music player interfaces. bon, Portugal IEEE Trans Multimed 9:567–575 161. Schedl M, Knees P, Gouyon F (2017) New paths in music rec- 143. Pu P, Chen L, Hu R (2012) Evaluating recommender systems ommender systems research. In: Proceedings of the 11th ACM from the user’s perspective: survey of the state of the art. User conference on recommender systems (RecSys 2017), Como, Italy Model User-Adapt Interact 22(4–5):317–355. https://doi.org/10. 162. Schedl M, Knees P, McFee B, Bogdanov D, Kaminskas M (2015) 1007/s11257-011-9115-7 Music recommender systems. In: Ricci F, Rokach L, Shapira B, 144. Punkanen M, Eerola T, Erkkilä J (2011) Biased emotional recog- Kantor PB (eds) Recommender systems handbook, chap. 13, 2nd nition in depression: perception of emotions in music by depressed edn. Springer, Berlin, pp 453–492 patients. J Affect Disord 130(1–2):118–126 163. Schedl M, Melenhorst M, Liem CC, Martorell A, Mayor O, 145. Quadrana M, Cremonesi P, Jannach D (2018) Sequence-aware Tkalci ˇ cˇ M (2016) A personality-based adaptive system for visu- recommender systems. arXiv preprint arXiv:1802.08452 alizing classical music performances. In: Proceedings of the 146. Rashid AM, Karypis G, Riedl J (2008) Learning preferences 7th ACM multimedia systems conference (MMSys), Klagenfurt, of new users in recommender systems: an information theoretic Austria approach. SIGKDD Explor Newsl 10:90–100. https://doi.org/10. 164. Schein AI, Popescul A, Ungar LH, Pennock DM (2002) Meth- 1145/1540276.1540302 ods and metrics for cold-start recommendations. In: SIGIR’02: 147. Rentfrow PJ, Gosling SD (2003) The do re mi’s of everyday life: Proceedings of the 25th annual international ACM SIGIR con- the structure and personality correlates of music preferences. J ference on research and development in information retrieval. Personal Soc Psychol 84(6):1236–1256 ACM, New York, NY, USA, pp 253–260. https://doi.org/10.1145/ 148. Repetto RC, Serra X (2014) Creating a corpus of Jingju (Bei- 564376.564421 jing opera) music and possibilities for melodic analysis. In: 15th 165. Serra X (2014) Computational approaches to the art music tradi- International society for music information retrieval conference, tions of India and Turkey. J New Music Res 43(1):1–2. https:// Taipei, Taiwan, pp 313–318 doi.org/10.1080/09298215.2014.894083 149. Reynolds G, Barry D, Burke T, Coyle E (2007) Towards a per- 166. Serra X (2014) Creating research corpora for the computational sonal automatic music playlist generation algorithm: the need for study of music: the case of the compmusic project. In: AES 53rd contextual information. In: Proceedings of the 2nd international international conference on semantic audio. AES, AES, London, audio mostly conference: interaction with sound, Ilmenau, Ger- UK, pp 1–9 many, pp 84–89 167. Seyerlehner K, Schedl M, Pohle T, Knees P (2010) Using 150. Ribeiro MT, Lacerda A, Veloso A, Ziviani N (2012) Pareto- block-level features for genre classification, tag classification and efficient hybridization for multi-objective recommender systems. music similarity estimation. In: Extended abstract to the music In: Proceedings of the Sixth ACM conference on recommender information retrieval evaluation eXchange (MIREX 2010)/11th systems, RecSys’12. ACM, New York, NY, USA, pp 19–26. international society for music information retrieval conference https://doi.org/10.1145/2365952.2365962 (ISMIR 2010), Utrecht, the Netherlands 151. Rubens N, Elahi M, Sugiyama M, Kaplan D (2015) Active 168. Seyerlehner K, Widmer G, Schedl M, Knees P (2010) Automatic learning in recommender systems. In: Recommender systems music tag classification based on block-level features. In: Proceed- handbook—chapter 24: recommending active learning. Springer ings of the 7th sound and music computing conference (SMC), US, pp 809–846 Barcelona, Spain 152. Russell JA (1980) A circumplex model of affect. J Personal Soc 169. Shao B, Wang D, Li T, Ogihara M (2009) Music recommendation Psychol 39(6):1161–1178 based on acoustic features and user access patterns. IEEE Trans 153. Schäfer T, Auerswald F, Bajorat IK, Ergemlidze N, Frille K, Audio Speech Lang Process 17(8):1602–1611 Gehrigk J, Gusakova A, Kaiser B, Pätzold RA, Sanahuja A, Sari S, 170. Skowron M, Ferwerda B, Tkalci ˇ cˇ M, Schedl M (2016) Fusing Schramm A, Walter C, Wilker T (2016) The effect of social feed- social media cues: personality prediction from Twitter and Insta- back on music preference. Musicae Sci 20(2):263–268. https:// gram. In: Proceedings of the 25th international world wide web doi.org/10.1177/1029864915622054 conference (WWW), Montreal, Canada 154. Schäfer T, Mehlhorn C (2017) Can personality traits predict musi- 171. Skowron M, Lemmerich F, Ferwerda B, Schedl M (2017) Predict- cal style preferences? A meta-analysis. Personal Individ Differ ing genre preferences from cultural and socio-economic factors 116:265–273. https://doi.org/10.1016/j.paid.2017.04.061 for music retrieval. In: Proceedings of the ECIR 155. Schäfer T, Sedlmeier P, Stdtler C, Huron D (2013) The psycho- 172. Slaney M, White W (2006) Measuring playlist diversity for rec- logical functions of music listening. Front Psychol 4(511):1–34 ommendation systems. In: Proceedings of the 1st ACM workshop 156. Schedl M (2017) Investigating country-specific music prefer- on Audio and music computing multimedia. ACM, pp 77–82 ences and music recommendation algorithms with the LFM-1b 173. Smyth B, McClave P (2001) Similarity vs. diversity. In: Proceed- dataset. Int J Multimed Inf Retr 6(1):71–84. https://doi.org/10. ings of the 4th international conference on case-based reasoning: 1007/s13735-017-0118-y case-based reasoning research and development, ICCBR’01. 157. Schedl M, Breitschopf G, Ionescu B (2014) Mobile music genius: Springer, London, UK, pp 347–361. http://dl.acm.org/citation. reggae at the beach, metal on a Friday night? In: Proceedings of cfm?id=646268.758890 the 4th ACM international conference on multimedia retrieval 174. Sordo M, Chaachoo A, Serra X (2014) Creating corpora for com- (ICMR), Glasgow, UK putational research in arab-andalusian music. In: 1st International 158. Schedl M, Flexer A, Urbano J (2013) The neglected user in music workshop on digital libraries for musicology, London, UK, pp. 1– information retrieval research. J Intell Inf Syst 41:523–539 3. https://doi.org/10.1145/2660168.2660182 159. Schedl M, Gómez E, Trent ES, Tkalci ˇ cˇ M, Eghbal-Zadeh H, 175. Swearingen K, Sinha R (2001) Beyond algorithms: an hci perspec- Martorell A (2017) On the Interrelation between listener char- tive on recommender systems. In: ACM SIGIR 2001 workshop acteristics and the perception of emotions in classical orches- on recommender systems, vol 13, pp 1–11 123 116 International Journal of Multimedia Information Retrieval (2018) 7:95–116 176. Tamir M (2011) The maturing field of emotion regulation. Emot 186. Yang YH, Chen HH (2011) Music emotion recognition. CRC Rev 3:3–7 Press, Boca Raton 177. Tintarev N, Lofi C, Liem CC (2017) Sequences of diverse song 187. Yang YH, Chen HH (2012) Machine recognition of music emo- recommendations: an exploratory study in a commercial system. tion: a review. ACM Trans Intell Syst Technol 3(4):40 In: Proceedings of the 25th conference on user modeling, adapta- 188. Yang YH, Chen HH (2013) Machine recognition of music emo- tion and personalization, UMAP’17. ACM, New York, NY, USA, tion: a review. Trans Intell Syst Technol 3(3):40:1–40:30 pp 391–392. https://doi.org/10.1145/3079628.3079633 189. Yoshii K, Goto M, Komatani K, Ogata T, Okuno HG (2006) 178. Tkalcic M, Kosir A, Tasic J (2013) The ldos-peraff-1 corpus of Hybrid collaborative and content-based music recommendation facial-expression video clips with affective, personality and user- using probabilistic model with latent user preferences. In: ISMIR, interaction metadata. J Multimodal User Interfaces 7(1–2):143– vol 6, p 7th 155. https://doi.org/10.1007/s12193-012-0107-7 190. Zamani H, Bendersky M, Wang X, Zhang M (2017) Situational 179. Tkalci ˇ cˇ M, Quercia D, Graf S (2016) Preface to the special context for ranking in personal search. In: Proceedings of the 26th issue on personality in personalized systems. User Model User- international conference on world wide web, WWW’17. Interna- Adapt Interact 26(2):103–107. https://doi.org/10.1007/s11257- tional world wide web conferences steering committee, Republic 016-9175-9 and Canton of Geneva, Switzerland, pp 1531–1540. https://doi. 180. Uitdenbogerd A, Schyndel R (2002) A review of factors affecting org/10.1145/3038912.3052648 music recommender success. In: 3rd International conference on 191. Zentner M, Grandjean D, Scherer KR (2008) Emotions evoked music information retrieval, ISMIR 2002. IRCAM-Centre Pom- by the sound of music: characterization, classification, and mea- pidou, pp 204–208 surement. Emotion 8(4):494 181. Vall A, Quadrana M, Schedl M, Widmer G, Cremonesi P (2017) 192. Zhang Z, Jin X, Li L, Ding G, Yang Q (2016) Multi-domain active The importance of song context in music playlists. In: Proceedings learning for recommendation. In: AAAI, pp 2358–2364 of the poster track of the 11th ACM conference on recommender 193. Zhang YC, O Seaghdha D, Quercia D, Jambor T (2012) Auralist: systems (RecSys), Como, Italy introducing serendipity into music recommendation. In: Proceed- 182. Vargas S, Baltrunas L, Karatzoglou A, Castells P (2014) Coverage, ings of the 5th ACM international conference on web search and redundancy and size-awareness in genre diversity for recom- data mining (WSDM), Seattle, WA, USA mender systems. In: Proceedings of the 8th ACM conference on 194. Zheleva E, Guiver J, Mendes Rodrigues E, Milic-Frayling ´ N recommender systems, RecSys’14. ACM, New York, NY, USA, (2010) Statistical models of music-listening sessions in social pp 209–216. https://doi.org/10.1145/2645710.2645743 media. In: Proceedings of the 19th international conference on 183. Vargas S, Castells P (2011) Rank and relevance in novelty and world wide web (WWW), Raleigh, NC, USA, pp 1019–1028 diversity metrics for recommender systems. In: Proceedings of 195. Zhou T, Kuscsik Z, Liu JG, Medo M, Wakeling JR, Zhang YC the 5th ACM conference on recommender systems (RecSys), (2010) Solving the apparent diversity-accuracy dilemma of rec- Chicago, IL, USA ommender systems. Proc Natl Acad Sci 107(10):4511–4515 184. Wang X, Rosenblum D, Wang Y (2012) Context-aware mobile 196. Ziegler CN, McNee SM, Konstan JA, Lausen G (2005) Improving music recommendation for daily activities. In: Proceedings of the recommendation lists through topic diversification. In: Proceed- 20th ACM international conference on multimedia. ACM, Nara, ings of the 14th international conference on the world wide web. Japan, pp 99–108 ACM, pp 22–32 185. Weimer M, Karatzoglou A, Smola A (2008) Adaptive collabo- rative filtering. In: RecSys’08: proceedings of the 2008 ACM conference on recommender systems. ACM, New York, NY, USA, pp. 275–282. https://doi.org/10.1145/1454008.1454050 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Multimedia Information Retrieval Springer Journals

Current challenges and visions in music recommender systems research

Loading next page...
 
/lp/springer-journals/current-challenges-and-visions-in-music-recommender-systems-research-oHs1JG3Kgt
Publisher
Springer Journals
Copyright
Copyright © 2018 by The Author(s)
Subject
Computer Science; Multimedia Information Systems; Information Storage and Retrieval; Information Systems Applications (incl.Internet); Data Mining and Knowledge Discovery; Image Processing and Computer Vision; Database Management
ISSN
2192-6611
eISSN
2192-662X
DOI
10.1007/s13735-018-0154-2
Publisher site
See Article on Publisher Site

Abstract

Music recommender systems (MRSs) have experienced a boom in recent years, thanks to the emergence and success of online streaming services, which nowadays make available almost all music in the world at the user’s fingertip. While today’s MRSs considerably help users to find interesting music in these huge catalogs, MRS research is still facing substantial challenges. In particular when it comes to build, incorporate, and evaluate recommendation strategies that integrate information beyond simple user–item interactions or content-based descriptors, but dig deep into the very essence of listener needs, preferences, and intentions, MRS research becomes a big endeavor and related publications quite sparse. The purpose of this trends and survey article is twofold. We first identify and shed light on what we believe are the most pressing challenges MRS research is facing, from both academic and industry perspectives. We review the state of the art toward solving these challenges and discuss its limitations. Second, we detail possible future directions and visions we contemplate for the further evolution of the field. The article should therefore serve two purposes: giving the interested reader an overview of current challenges in MRS research and providing guidance for young researchers by identifying interesting, yet under-researched, directions in the field. Keywords Music recommender systems · Challenges · Automatic playlist continuation · User-centric computing 1 Introduction This research was supported in part by the Center for Intelligent Research in music recommender systems (MRSs) has recently Information Retrieval. Any opinions, findings and conclusions or experienced a substantial gain in interest both in academia recommendations expressed in this material are those of the authors and in industry [162]. Thanks to music streaming services and do not necessarily reflect those of the sponsors. like Spotify, Pandora, or Apple Music, music aficionados Markus Schedl B are nowadays given access to tens of millions music pieces. markus.schedl@jku.at By filtering this abundance of music items, thereby limiting Hamed Zamani choice overload [20], MRSs are often very successful to sug- zamani@cs.umass.edu gest songs that fit their users’ preferences. However, such Ching-Wei Chen systems are still far from being perfect and frequently pro- cw@spotify.com duce unsatisfactory recommendations. This is partly because Yashar Deldjoo of the fact that users’ tastes and musical needs are highly yashar.deldjoo@polimi.it dependent on a multitude of factors, which are not considered Mehdi Elahi in sufficient depth in current MRS approaches, which are typ- meelahi@unibz.it ically centered on the core concept of user–item interactions, or sometimes content-based item descriptors. In contrast, we Department of Computational Perception, Johannes Kepler University Linz, Linz, Austria Department of Computer Science, Politecnico di Milano, Center for Intelligent Information Retrieval, Milan, Italy University of Massachusetts Amherst, Amherst, USA Free University of Bozen-Bolzano, Bolzano, Italy Spotify USA Inc., New York, USA 123 96 International Journal of Multimedia Information Retrieval (2018) 7:95–116 argue that satisfying the users’ musical entertainment needs uation, and properly evaluating music recommender systems. requires taking into account intrinsic, extrinsic, and con- We review the state of the art of the respective tasks and its textual aspects of the listeners [2], as well as more decent current limitations. interaction information. For instance, personality and emo- tional state of the listeners (intrinsic) [71,147]aswellas 2.1 Particularities of music recommendation their activity (extrinsic) [75,184] are known to influence musical tastes and needs. So are users’ contextual factors Before we start digging deeper into these challenges, we including weather conditions, social surrounding, or places would first like to highlight the major aspects that make music of interest [2,100]. Also the composition and annotation of a recommendation a particular endeavor and distinguishes it music playlist or a listening session reveal information about from recommending other items, such as movies, books, or which songs go well together or are suited for a certain occa- products. These aspects have been adapted and extended sion [126,194]. Therefore, researchers and designers of MRS from a tutorial on music recommender systems [161], co- should reconsider their users in a holistic way in order to build presented by one of the authors at the ACM Recommender systems tailored to the specificities of each user. 2 Systems 2017 conference. Against this background, in this trends and survey arti- Duration of items In traditional movie recommendation, the cle, we elaborate on what we believe to be among the most items of interest have a typical duration of 90 min or more. In pressing current challenges in MRS research, by discussing book recommendation, the consumption time is commonly the respective state of the art and its restrictions (Sect. 2). Not even much longer. In contrast, the duration of music items being able to touch all challenges exhaustively, we focus on usually ranges between 3 and 5 min (except maybe for clas- cold start, automatic playlist continuation, and evaluation of sical music). Because of this, music items may be considered MRS. While these problems are to some extent prevalent in more disposable. other recommendation domains too, certain characteristics of music pose particular challenges in these contexts. Among Magnitude of items The size of common commercial music them are the short duration of items (compared to movies), catalogs is in the range of tens of millions music pieces, while the high emotional connotation of music, and the acceptance movie streaming services have to deal with much smaller of users for duplicate recommendations. In the second part, catalog sizes, typically thousands up to tens of thousands we present our visions for future directions in MRS research 3 of movies and series. Scalability is therefore a much more (Sect. 3). More precisely, we elaborate on the topics of important issue in music recommendation than in movie rec- psychologically inspired music recommendation (consider- ommendation. ing human personality and emotion), situation-aware music Sequential consumption Unlike movies, music pieces are recommendation, and culture-aware music recommendation. most frequently consumed sequentially, more than one at We conclude this article with a summary and identification of a time, i.e., in a listening session or playlist. This yields a possible starting points for the interested researcher to face number of challenges for a MRS, which relate to identifying the discussed challenges (Sect. 4). the right arrangement of items in a recommendation list. The composition of the authors allows to take academic as well as industrial perspectives, which are both reflected Recommendation of previously recommended items Recom- in this article. Furthermore, we would like to highlight that mending the same music piece again, at a later point in time, particularly the ideas presented as Challenge 2: Automatic may be appreciated by the user of a MRS, in contrast to a playlist continuation in Sect. 2 play an important role in the movie or product recommender, where repeated recommen- task definition, organization, and execution of the ACM Rec- dations are usually not preferred. ommender Systems Challenge 2018 which focuses on this Consumption behavior Music is often consumed passively, use case. This article may therefore also serve as an entry in the background. While this is not a problem per se, it point for potential participants in this challenge. can affect preference elicitation. In particular when using implicit feedback to infer listener preferences, the fact that 2 Grand challenges http://www.cp.jku.at/tutorials/mrs_recsys_2017. In the following, we identify and detail a selection of 3 Spotify reports about 30 million songs in 2017 (https://press.spotify. the grand challenges, which we believe the research field com/at/about); Amazon’s advanced search for books reports 10 mil- lion hardcover and 30 million paperback books in 2017 (https://www. of music recommender systems is currently facing, i.e., amazon.com/Advanced-Search-Books/b?node=241582011), whereas overcoming the cold start problem, automatic playlist contin- Netflix, in contrast, offers about 5,500 movies and TV series as of 2016 (http://time.com/4272360/the-number-of-movies-on-netflix- http://www.recsyschallenge.com/2018. is-dropping-fast). 123 International Journal of Multimedia Information Retrieval (2018) 7:95–116 97 a listener is not paying attention to the music (therefore, is a highly challenging task and usually neglected in cur- e.g., not skipping a song) might be wrongly interpreted as rent MRS, for which reason we discuss emotion-aware MRS a positive signal. as one of the main future directions in MRS research, cf. Sect. 3.1. Listening intent and purpose Music serves various purposes for people and hence shapes their intent to listen to it. This Listening context Situational or contextual aspects [15,48] should be taken into account when building a MRS. In exten- have a strong influence on music preference, consump- sive literature and empirical studies, Schäfer et al. [155] tion, and interaction behavior. For instance, a listener will distilled three fundamental intents of music listening out of likely create a different playlist when preparing for a roman- 129 distinct music uses and functions: self-awareness, social tic dinner than when warming-up with friends to go out relatedness, and arousal and mood regulation. Self-awareness on a Friday night [75]. The most frequently considered is considered as a very private relationship with music listen- types of context include location (e.g., listening at work- ing. The self-awareness dimension “helps people think about place, when commuting, or relaxing at home) [100] and who they are, who they would like to be, and how to cut their time (typically categorized into, for example, morning, after- own path” [154]. Social relatedness [153] describes the use noon, and evening) [31]. Context may, in addition, also of music to feel close to friends and to express identity and relate to the listener’s activity [184], weather [140], or values to others. Mood regulation is concerned with manag- the use of different listening devices, e.g., earplugs on ing emotions, which is a critical issue when it comes to the a smartphone vs. hi-fi stereo at home [75], to name a well-being of humans [77,110,176]. In fact, several studies few. Since music listening is also a highly social activ- found that mood and emotion regulation is the most impor- ity, investigating the social context of the listeners is tant purpose why people listen to music [18,96,122,155], for crucial to understand their listening preferences and behav- which reason we discuss the particular role emotions play ior [45,134]. The importance of considering such con- when listening to music separately below. textual factors in MRS research is acknowledged by dis- cussing situation-aware MRS as a trending research direc- Emotions Music is known to evoke very strong emotions. tion, cf. Sect. 3.2. This is a mutual relationship, though, since also the emo- tions of users affect musical preferences [17,77,144]. Due to this strong relationship between music and emotions, the problem of automatically describing music in terms of emo- tion words is an active research area, commonly refereed 2.2 Challenge 1: Cold start problem to as music emotion recognition (MER), e.g., [14,103,187]. Even though MER can be used to tag music by emotion Problem definition One of the major problems of recom- terms, how to integrate this information into MRS is a highly mender systems in general [64,151], and music recommender complicated task, for three reasons. First, MER approaches systems in particular [99,119]isthe cold start problem, i.e., commonly neglect the distinction between intended emotion when a new user registers to the system or a new item is (i.e., the emotion the composer, songwriter, or performer added to the catalog and the system does not have sufficient had in mind when creating or performing the piece), per- data associated with these items/users. In such a case, the ceived emotion (i.e., the emotion recognized while listening), system cannot properly recommend existing items to a new and induced emotion that is felt by the listener. Second, the user (new user problem) or recommend a new item to the preference for a certain kind of emotionally laden music existing users (new item problem) [3,62,99,164]. piece depends on whether the user wants to enhance or to Another subproblem of cold start is the sparsity problem modulate her mood. Third, emotional changes often occur which refers to the fact that the number of given ratings is within the same music piece, whereas tags are commonly much lower than the number of possible ratings, which is extracted for the whole piece. Matching music and lis- particularly likely when the number of users and items is teners in terms of emotions therefore requires to model large. The inverse of the ratio between given and possible rat- the listener’s musical preference as a time-dependent func- ings is called sparsity. High sparsity translates into low rating tion of their emotional experiences, also considering the coverage, since most users tend to rate only a tiny fraction intended purpose (mood enhancement or regulation). This of items. The effect is that recommendations often become unreliable [99]. Typical values of sparsity are quite close to Please note that the terms “emotion” and “mood” have different mean- 100% in most real-world recommender systems. In the music ings in psychology, whereas they are commonly used as synonyms in domain, this is a particularly substantial problem. Dror et music information retrieval (MIR) and recommender systems research. al. [51], for instance, analyzed the Yahoo! Music dataset, In psychology, in contrast, “emotion” refers to a short-time reaction to which as of time of writing represents the largest music rec- a particular stimulus, whereas “mood” refers to a longer-lasting state without relation to a specific stimulus. ommendation dataset. They report a sparsity of 99.96%. For 123 98 International Journal of Multimedia Information Retrieval (2018) 7:95–116 comparison, the Netflix dataset of movies has a sparsity of is performed in the training and testing phases of the system, “only” 98.82%. and the extracted feature vectors can be used off-the-shelf in the subsequent processing stage; for example, they can be State of the art used to compute similarities between items in a one-to-one A number of approaches have already been proposed to fashion at testing time. In contrast, in (2) first a model is built tackle the cold start problem in the music recommendation from all features extracted in the training phase, whose main domain, foremost content-based approaches, hybridization, role is to map the features into a new (acoustic) space in cross-domain recommendation, and active learning. which the similarities between items are better represented and exploited. An example of approach (1) is the block-level Content-based recommendation (CB) algorithms do not feature framework [167,168], which creates a feature vec- require ratings of users other than the target user. Therefore, tor of about 10,000 dimensions, independently for each song as long as some pieces of information about the user’s own in the given music collection. This vector describes aspects preferences are available, such techniques can be used in cold such as spectral patterns, recurring beats, and correlations start scenarios. Furthermore, in the most severe case, when between frequency bands. An example of strategy (2) is to a new item is added to the catalog, content-based methods create a low-dimensional i-vector representation from the enable recommendations, because they can extract features Mel-frequency cepstral coefficients (MFCCs), which model from the new item and use them to make recommendations. musical timbre to some extent [56]. To this end, a univer- It is noteworthy that while collaborative filtering (CF) sys- sal background model is created from the MFCC vectors of tems have cold start problems both for new users and new the whole music collection, using a Gaussian mixture model items, content-based systems have only cold start problems (GMM). Performing factor analysis on a representation of for new users [5]. the GMM eventually yields i-vectors. As for the new item problem, a standard approach is to In scenarios where some form of semantic labels, e.g., extract a number of features that define the acoustic prop- genres or musical instruments, are available, it is possi- erties of the audio signal and use content-based learning ble to build models that learn the intermediate mapping of the user interest (user profile learning) in order to effect between low-level audio features and semantic representa- recommendations. Feature extraction is typically done auto- tions using machine learning techniques, and subsequently matically, but can also be effected manually by musical use the learned models for prediction. A good point of ref- experts, as in the case of Pandora’s Music Genome Project. erence for such semantic-inferred approaches can be found Pandora uses up to 450 specific descriptors per song, such in [19,36]. as “aggressive female vocalist,” “prominent backup vocals,” An alternative technique to tackle the new item problem “abstract lyrics,” or “use of unusual harmonies.” Regard- is hybridization. A review of different hybrid and ensemble less of whether the feature extraction process is performed recommender systems can be found in [6,26]. In [50], the automatically or manually, this approach is advantageous authors propose a music recommender system which com- not only to address the new item problem but also because bines an acoustic CB and an item-based CF recommender. an accurate feature representation can be highly predicative For the content-based component, it computes acoustic fea- of users’ tastes and interests which can be leveraged in the tures including spectral properties, timbre, rhythm, and pitch. subsequent information filtering stage [5]. An advantage of The content-based component then assists the collaborative music to video is that features in music are limited to a sin- filtering recommender in tackling the cold start problem since gle audio channel, compared to audio and visual channels the features of the former are automatically derived via audio for videos adding a level complexity to the content analysis content analysis. of videos explored individually or multimodal in different The solution proposed in [189] is a hybrid recommender research works [46,47,59,128]. system that combines CF and acoustic CB strategies also by Automatic feature extraction from audio signals can be feature hybridization. However, in this work the feature-level done in two main manners: (1) by extracting a feature vec- hybridization is not performed in the original feature domain. tor from each item individually, independent of other items, Instead, a set of latent variables referred to as conceptual or (2) by considering the cross-relation between items in the genre are introduced, whose role is to provide a common training dataset. The difference is that in (1) the same process shared feature space for the two recommenders and enable hybridization. The weights associated with the latent vari- Note that Dror et al.’s analysis was conducted in 2011. Even though ables reflect the musical taste of the target user and are learned the general character (rating matrices for music items being sparser than during the training stage. those of movie items) remained the same, the actual numbers for today’s catalogs are likely slightly different. In [169], the authors propose a hybrid recommender sys- http://www.pandora.com/about/mgp. tem incorporating item–item CF and acoustic CB based on http://enacademic.com/dic.nsf/enwiki/3224302. similarity metric learning. The proposed metric learning is 123 International Journal of Multimedia Information Retrieval (2018) 7:95–116 99 an optimization model that aims to learn the weights asso- music domain, the authors of [4] provide an interesting liter- ciated with the audio content features (when combined in a ature review of similar user-specific models. linear fashion) so that a degree of consistency between CF- While hybridization can therefore alleviate the cold start based similarity and the acoustic CB similarity measure is problem to a certain extent, as seen in the examples above, established. The optimization problem can be solved using respective approaches are often complex, computationally quadratic programming techniques. expensive, and lack transparency [27]. In particular, results Another solution to cold start is cross-domain recommen- of hybrids employing latent factor models are typically hard dation techniques, which aim at improving recommendations to understand for humans. in one domain (here music) by making use of information A major problem with cross-domain recommender sys- about the user preferences in an auxiliary domain [28,67]. tems is their need for data that connects two or more target Hence, the knowledge of the preferences of the user is domains, e.g., books, movies, and music [29]. In order for transferred from an auxiliary domain to the music domain, such approaches to work properly, items, users, or both there- resulting in a more complete and accurate user model. Sim- fore need to overlap to a certain degree [40]. In the absence ilarly, it is also possible to integrate additional pieces of of such overlap, relationships between the domains must be information about the (new) users, which are not directly established otherwise, e.g., by inferring semantic relation- ships between items in different domains or assuming similar related to music, such as their personality, in order to improve the estimation of the user’s music preferences. Several stud- rating patterns of users in the involved domains. However, ies conducted on user personality characteristics support the whether respective approaches are capable of transferring conjecture that it may be useful to exploit this information in knowledge between domains is disputed [39]. A related issue music recommender systems [69,73,86,130,147]. For a more in cross-domain recommendation is that there is a lack of detailed literature review of cross-domain recommendation, established datasets with clear definitions of domains and we refer to [29,68,102]. recommendation scenarios [102]. Because of this, the major- In addition to the aforementioned approaches, active ity of existing work on cross-domain RS uses some type of learning has shown promising results in dealing with the cold conventional recommendation dataset transformation to suit start problem in single domain [60,146] or cross-domain rec- it for their need. ommendation scenario [136,192]. Active learning addresses Finally, also active learning techniques suffer from a this problem at its origin by identifying and eliciting (high number of issues. First of all, the typical active learning tech- quality) data that can represent the preferences of users bet- niques propose to a user to rate the items that the system ter than by what they provide themselves. Such a system has predicted to be interesting for them, i.e., the items with therefore interactively demands specific user feedback to highest predicted ratings. This indeed is a default strategy in maximize the improvement of system performance. recommender systems for eliciting ratings since users tend to rate what has been recommended to them. Even when users Limitations The state-of-the-art approaches elaborated on browse the item catalog, they are more likely to rate items above are restricted by certain limitations. When using which they like or are interested in, rather than those items content-based filtering, for instance, almost all existing that they dislike or are indifferent to. Indeed, it has been approaches rely on a number of predefined audio features shown that doing so creates a strong bias in the collected that have been used over and over again, including spectral rating data as the database gets populated disproportionately features, MFCCs, and a great number of derivatives [106]. with high ratings. This in turn may substantially influence However, doing so assumes that (all) these features are pre- the prediction algorithm and decrease the recommendation dictive of the user’s music taste, while in practice it has been accuracy [63]. shown that the acoustic properties that are important for the Moreover, not all the active learning strategies are neces- perception of music are highly subjective [132]. Furthermore, sarily personalized. The users differ very much in the amount listeners’ different tastes and levels of interest in different of information they have about the items, their preferences, pieces of music influence perception of item similarity [158]. and the way they make decisions. Hence, it is clearly inef- This subjectiveness demands for CB recommenders that ficient to request all the users to rate the same set of items, incorporate personalization in their mathematical model. For because many users may have a very limited knowledge, example, in [65] the authors propose a hybrid (CB+CF) ignore many items, and will therefore not provide ratings recommender model, namely regression-based latent factor for these items. Properly designed active learning techniques models (RLFM). In [4], the authors propose a user-specific should take this into account and propose different items feature-based similarity model (UFSM), which defines a sim- to different users to rate. This can be highly beneficial and ilarity function for each user, leading to a high degree of increase the chance of acquiring ratings of higher quality personalization. Although not designed specifically for the [57]. 123 100 International Journal of Multimedia Information Retrieval (2018) 7:95–116 Moreover, the traditional interaction model designed for pelling playlists without needing to have extensive musical active learning in recommender systems can support build- familiarity. ing the initial profile of a user mainly in the sign-up process. A large part of the APC task is to accurately infer the This is done by generating a user profile by requesting the intended purpose of a given playlist. This is challenging not user to rate a set of selected items [30]. On the other hand, only because of the broad range of these intended purposes the users must be able to also update their profile by provid- (when they even exist), but also because of the diversity in the ing more ratings anytime they are willing to. This requires underlying features or characteristics that might be needed the system to adopt a conversational interaction model [30], to infer those purposes. e.g., by exploiting novel interactive design elements in the Related to Challenge 1, an extreme cold start scenario for user interface [38], such as explanations that can describe the this task is where a playlist is created with some metadata benefits of providing more ratings and motivating the user to (e.g., the title of a playlist), but no song has been added to the do so. playlist. This problem can be cast as an ad hoc information Finally, it is important to note that in an up-and-running retrieval task, where the task is to rank songs in response to recommender system, the ratings are given by users not a user-provided metadata query. only when requested by the system (active learning) but The APC task can also potentially benefit from user profil- also when a user voluntarily explores the item catalog and ing, e.g., making use of previous playlists and the long-term rates some familiar items (natural acquisition of ratings) listening history of the user. We call this personalized playlist [30,61,63,127,146]. While this could have a huge impact on continuation. the performance of the system, it has been mostly ignored by According to a study carried out in 2016 by the Music the majority of the research works in the field of active learn- Business Association as part of their Music Biz Consumer ing for recommender systems. Indeed, almost all research Insights program, playlists accounted for 31% of music lis- works have been based on a rather non-realistic assumption tening time among listeners in the USA, more than albums that the only source for collecting new ratings is through the (22%), but less than single tracks (46%). Other studies, con- system requests. Therefore, it is crucial to take into account ducted by MIDiA, show that 55% of streaming music a more realistic scenario when studying the active learning service subscribers create music playlists, with some stream- techniques in recommender systems, which can better picture ing services such as Spotify currently hosting over 2 billion 11 12 how the system evolves over time when ratings are provided playlists. In a 2017 study conducted by Nielsen, it was by users [143,146]. found that 58% of users in the USA create their own playlists, 32% share them with others. Studies like these suggest a 2.3 Challenge 2: Automatic playlist continuation growing importance of playlists as a mode of music con- sumption, and as such, the study of APG and APC has never Problem definition In its most generic definition, a playlist been more relevant. is simply a sequence of tracks intended to be listened to State of the art APG has been studied ever since digi- together. The task of automatic playlist generation (APG) tal multimedia transmission made huge catalogs of music then refers to the automated creation of these sequences of available to users. Bonnin and Jannach provide a compre- tracks. In this context, the ordering of songs in a playlist to hensive survey of this field in [21]. In it, the authors frame generate is often highlighted as a characteristics of APG, the APG task as the creation of a sequence of tracks that which is a highly complex endeavor. Some authors have fulfill some “target characteristics” of a playlist, given some therefore proposed approaches based on Markov chains to “background knowledge” of the characteristics of the catalog model the transitions between songs in playlists, e.g., [32, of tracks from which the playlist tracks are drawn. Existing 125]. While these approaches have been shown to outper- APG systems tackle both of these problems in many different form approaches agnostic of the song order in terms of ways. log-likelihood, recent research has found little evidence that the exact order of songs actually matters to users [177], while the ensemble of songs in a playlist [181] and direct song- https://musicbiz.org/news/playlists-overtake-albums-listenership- to-song transitions [92] do matter. says-loop-study. Considered a variation of APG, the task of automatic https://musicbiz.org/resources/tools/music-biz-consumer-insights/ playlist continuation (APC) consists of adding one or more consumer-insights-portal. tracks to a playlist in a way that fits the same target charac- https://www.midiaresearch.com/blog/announcing-midias-state-of- teristics of the original playlist. This has benefits in both the the-streaming-nation-2-report. listening and creation of playlists: users can enjoy listening to https://press.spotify.com/us/about. continuous sessions beyond the end of a finite-length playlist, http://www.nielsen.com/us/en/insights/reports/2017/music-360- while also finding it easier to create longer, more com- 2017-highlights.html. 123 International Journal of Multimedia Information Retrieval (2018) 7:95–116 101 In early approaches [9,10,135]the target characteristics tant criterion for playlist quality. In another recent user of the playlist are specified as multiple explicit constraints, study [177] conducted by Tintarev et al. the authors found that which include musical attributes or metadata such as artist, many participants did not care about the order of tracks in rec- tempo, and style. In others, the target characteristics are a ommended playlists, sometimes they did not even notice that single seed track [121] or a start and an end track [9,32,74]. there is a particular order. However, this study was restricted Other approaches create a circular playlist that comprises to 20 participants who used the Discover Weekly service of all tracks in a given music collection, in such a way that Spotify. consecutive songs are as similar as possible [105,142]. In Another challenge for APC is evaluation: in other words, other works, playlists are created based on the context of the how to assess the quality of a playlist. Evaluation in gen- listener, either as single source [157] or in combination with eral is discussed in more detail in the next section, but there content-based similarity [35,149]. are specific questions around evaluation of playlists that A common approach to build the background knowledge should be pointed out here. As Bonnin and Jannach [21] of the music catalog for playlist generation is using machine put it, the ultimate criterion for this is user satisfaction,but learning techniques to extract that knowledge from manually that is not easy to measure. In [125], McFee and Lanck- curated playlists. The assumption here is that curators of these riet categorize the main approaches to APG evaluation as human evaluation, semantic cohesion, and sequence pre- playlists are encoding rich latent information about which tracks go together to create a satisfying listening experience diction. Human evaluation comes closest to measuring user for an intended purpose. Some proposed APG and APC sys- satisfaction directly, but suffers from problems of scale and tems are trained on playlists from sources such as online reproducibility. Semantic cohesion as a quality metric is radio stations [32,123], online playlist websites [126,181], easily measurable and reproducible, but assumes that users and music streaming services [141]. In the study by Pichl et prefer playlists where tracks are similar along a particu- al. [141], the names of playlists on Spotify were analyzed to lar semantic dimension, which may not always be true, create contextual clusters, which were then used to improve see, for instance, the studies carried out by Slaney and recommendations. White [172] and by Lee [115]. Sequence prediction casts An approach to specifically address song ordering within APC as an information retrieval task, but in the domain of playlists is the use of generative models that are trained on music, an inaccurate prediction needs not be a bad recom- hand-curated playlists. McFee and Lanckriet [125] represent mendation, and this again leads to a potential disconnect songs by metadata, familiarity, and audio content features, between this metric and the ultimate criterion of user sat- adopting ideas from statistical natural language process- isfaction. ing. They train various Markov chains to model transitions Investigating which factors are potentially important for between songs. Similarly, Chen et al. [32] propose a logistic a positive user perception of a playlist, Lee conducted a Markov embedding to model song transitions. This is similar qualitative user study [115], investigating playlists that had to matrix decomposition methods and results in an embed- been automatically created based on content-based similar- ding of songs in Euclidean space. In contrast to McFee and ity. They made several interesting observations. A concern Lanckriet’s model, Chen et al.’s model does not use any audio frequently raised by participants was that of consecutive features. songs being too similar, and a general lack of variety. However, different people had different interpretations of Limitations While some work on automated playlist con- variety, e.g., variety in genres or styles vs. different artists tinuation highlights the special characteristics of playlists, in the playlist. Similarly, different criteria were mentioned i.e., their sequential order, it is not well understood to which when listeners judged the coherence of songs in a playlist, extent and in which cases taking into account the order of including lyrical content, tempo, and mood. When cre- tracks in playlists helps create better models for recommen- ating playlists, participants mentioned that similar lyrics, dation. For instance, in [181] Vall et al. recently demonstrated a common theme (e.g., music to listen to in the train), on two datasets of hand-curated playlists that the song order story (e.g., music for the Independence Day), or era (e.g., seems to be negligible for accurate playlist continuation when rock music from the 1980s) are important and that tracks a lot of popular songs are present. On the other hand, the not complying negatively effect the flow of the playlist. authors argue that order does matter when creating playlists These aspects can be extended by responses of partici- with tracks from the long tail. Another study by McFee and pants in a study conducted by Cunningham et al. [42], Lanckriet [126] also suggests that transition effects play an important role in modeling playlist continuity. This is in line The ranking of criteria (from most to least important) was: homo- with a study presented by Kamehkhosh et al. in [92], in which geneity, artist diversity, transition, popularity, lyrics, order, and fresh- users identified song order as being the second but last impor- ness. https://www.spotify.com/discoverweekly. 123 102 International Journal of Multimedia Information Retrieval (2018) 7:95–116 who further identified the following categories of playlists: 2.4 Challenge 3: Evaluating music recommender same artist, genre, style, or orchestration, playlists for a systems certain event or activity (e.g., party or holiday), romance (e.g., love songs or breakup songs), playlists intended to Problem definition Having its roots in machine learning send a message to their recipient (e.g., protest songs), (cf. rating prediction) and information retrieval (cf. “retriev- and challenges or puzzles (e.g., cover songs liked more ing” items based on implicit “queries” given by user prefer- than the original or songs whose title contains a question ences), the field of recommender systems originally adopted mark). evaluation metrics from these neighboring fields. In fact, Lee also found that personal preferences play a major accuracy and related quantitative measures, such as preci- role. In fact, already a single song that is very much liked sion, recall, or error measures (between predicted and true or hated by a listener can have a strong influence on how ratings), are still the most commonly employed criteria to they judge the entire playlist [115]. This seems particularly judge the recommendation quality of a recommender sys- true if it is a highly disliked song [44]. Furthermore, a good tem [11,78]. In addition, novel measures that are tailored to mix of familiar and unknown songs was often mentioned as the recommendation problem have emerged in recent years. an important requirement for a good playlist. Supporting the These so-called beyond-accuracy measures [98] address discovery of interesting new songs, still contextualized by the particularities of recommender systems and gauge, for familiar ones, increases the likelihood of realizing a serendip- instance, the utility, novelty, or serendipity of an item. How- itous encounter in a playlist [160,193]. Finally, participants ever, a major problem with these kinds of measures is that also reported that their familiarity with a playlist’s genre or they integrate factors that are hard to describe mathemati- theme influenced their judgment of its quality. In general, cally, for instance, the aspect of surprise in case of serendipity listeners were more picky about playlists whose tracks they measures. For this reason, there sometimes exist a variety of were familiar with or they liked a lot. different definitions to quantify the same beyond-accuracy Supported by the studies summarized above, we argue aspect. that the question of what makes a great playlist is highly State of the art In the following, we discuss performance subjective and further depends on the intent of the creator measures which are most frequently reported when evalu- or listener. Important criteria when creating or judging a ating recommender systems. An overview of these is given playlist include track similarity/coherence, variety/diversity, in Table 1. They can be roughly categorized into accuracy- but also the user’s personal preferences and familiarity with related measures, such as prediction error (e.g., MAE and the tracks, as well as the intention of the playlist cre- RMSE) or standard IR measures (e.g., precision and recall), ator. Unfortunately, current automatic approaches to playlist and beyond-accuracy measures, such as diversity, novelty, continuation are agnostic of the underlying psychologi- and serendipity. Furthermore, while some of the metrics cal and sociological factors that influence the decision of quantify the ability of recommender systems to find good which songs users choose to include in a playlist. Since items, e.g., precision and recall, others consider the ranking knowing about such factors is vital to understand the of items and therefore assess the system’s ability to position intent of the playlist creator, we believe that algorithmic good recommendations at the top of the recommendation list, methods for APC need to holistically learn such aspects e.g., MAP, NDCG, or MPR. from manually created playlists and integrate respective intent models. However, we are aware that in today’s era Mean absolute error (MAE) is one of the most common met- rics for evaluating the prediction power of recommender where billions of playlists are shared by users of online streaming services, a large-scale analysis of psycholog- algorithms. It computes the average absolute deviation between the predicted ratings and the actual ratings provided ical and sociological background factors is impossible. Nevertheless, in the absence of explicit information about by users [81]. Indeed, MAE indicates how close the rating predictions generated by an MRS are to the real user ratings. user intent, a possible starting point to create intent mod- els might be the metadata associated with user-generated MAE is computed as follows: playlists, such as title or description. To foster this kind of research, the playlists provided in the dataset for the ACM Recommender Systems Challenge 2018 include playlist MAE = |r −ˆr | (1) u,i u,i |T | titles. r ∈T u,i where r and rˆ , respectively, denote the actual and the u,i u,i https://press.spotify.com/us/about. predicted ratings of item i for user u. MAE sums over the https://recsys-challenge.spotify.com. absolute prediction errors for all ratings in a test set T . 123 International Journal of Multimedia Information Retrieval (2018) 7:95–116 103 Table 1 Evaluation measures Measure Abbreviation Type Ranking-aware commonly used for recommender systems Mean absolute error MAE Error/accuracy No Root-mean-square error RMSE Error/accuracy No Precision at top K recommendations P@K Accuracy No Recall at top K recommendations R@K Accuracy No Mean average precision at top K recommendations MAP@K Accuracy Yes Normalized discounted cumulative gain NDCG Accuracy Yes Half-life utility HLU Accuracy Yes Mean percentile rank MPR Accuracy Yes Spread – Beyond No Coverage – Beyond No Novelty – Beyond No Serendipity – Beyond No Diversity – Beyond No u. The overall P@K is then computed by averaging P @K values for all users in the test set. Root-mean-square error (RMSE) is another similar metric that is computed as: Mean average precision at top K recommendations (MAP@K) is a rank-based metric that computes the overall precision of the system at different lengths of recommendation lists. MAP is computed as the arithmetic mean of the average precision RMSE =  (r −ˆr ) . (2) u,i u,i |T | over the entire set of users in the test set. Average precision r ∈T u,i for the top K recommendations (AP@K ) is defined as fol- lows: It is an extension to MAE in that the error term is squared, which penalizes larger differences between predicted and AP@K = P@i · rel(i ) (4) true ratings more than smaller ones. This is motivated by the i =1 assumption that, for instance, a rating prediction of 1 when the true rating is 4 is much more severe than a prediction of th where rel(i ) is an indicator signaling if the i recommended 3 for the same item. item is relevant, i.e., rel(i ) = 1, or not, i.e., rel(i ) = 0; N is the total number of relevant items. Note that MAP implic- Precision at top K recommendations (P@K) is a common itly incorporates recall, because it also considers the relevant metric that measures the accuracy of the system in command- items not in the recommendation list. ing relevant items. In order to compute P@K , for each user, the top K recommended items whose ratings also appear Recall at top K recommendations (R@K) is presented here in the test set T are considered. This metric was originally for the sake of completeness, even though it is not a crucial designed for binary relevance judgments. Therefore, in case measure from a consumer’s perspective. Indeed, the listener of availability of relevance information at different levels, is typically not interested in being recommended all or a large such as a five-point Likert scale, the labels should be bina- number of relevant items, rather in having good recommen- rized, e.g., considering the ratings greater than or equal to 4 dations at the top of the recommendation list. For a user u, (out of 5) as relevant. For each user u, P @K is computed R @K is defined as: as follows: We should note that in the recommender systems community, another variation of average precision is gaining popularity recently, formally 1 K |L ∩ L | u u defined by: AP@K = P@i · rel(k) in which N is the i =1 min(K ,N ) P @K = (3) total number of relevant items and K is the size of recommendation |L | list. The motivation behind the minimization term is to prevent the AP scores to be unfairly suppressed when the number of recommendations is too low to capture all the relevant items. This variation of MAP was where L is the set of relevant items for user u in the test popularized by Kaggle competitions [97] about recommender systems set T and L denotes the recommended set containing the and has been used in several other research works, consider for exam- K items in T with the highest predicted ratings for the user ple [8,124]. 123 104 International Journal of Multimedia Information Retrieval (2018) 7:95–116 |L ∩ L | u u max (r − d, 0) u,i R @K = (5) u HLU = (8) |L | (rank −1)/(h−1) u,i i =1 where L is the set of relevant items of user u in the test set where r and rank denote the rating and the rank of item T and L denotes the recommended set containing the K u,i u,i i for user u, respectively, in the recommendation list of length items in T with the highest predicted ratings for the user u. N; d represents a default rating (e.g., average rating); and h The overall R@K is calculated by averaging R @K values is the half-time, calculated as the rank of a music item in the for all the users in the test set. list, such that the user can eventually listen to it with a 50% Normalized discounted cumulative gain (NDCG)isamea- chance. HLU can be further normalized by the maximum sure for the ranking quality of the recommendations. This utility (similar to NDCG), and the final HLU is the average metric has originally been proposed to evaluate the effec- over the half-time utilities obtained for all users in the test set. tiveness of information retrieval systems [93]. It is nowadays A larger HLU may correspond to a superior recommendation also frequently used for evaluating music recommender sys- performance. tems [120,139,185]. Assuming that the recommendations for Mean percentile rank (MPR) estimates the users’ satisfaction user u are sorted according to the predicted rating values in with items in the recommendation list and is computed as the descending order. DCG is defined as follows: average of the percentile rank for each test item within the ranked list of recommended items for each user [89]. The u,i percentile rank of an item is the percentage of items whose DCG = (6) log (i + 1) position in the recommendation list is equal to or lower than i =1 the position of the item itself. Formally, the percentile rank PR for user u is defined as: where r is the true rating (as found in test set T ) for the item u,i ranked at position i for user u, and N is the length of the rec- ommendation list. Since the rating distribution depends on r · rank u,i u,i r ∈T u,i PR =  (9) the users’ behavior, the DCG values for different users are u,i r ∈T u,i not directly comparable. Therefore, the cumulative gain for each user should be normalized. This is done by computing the ideal DCG for user u, denoted as IDCG , which is the where r is the true rating (as found in test set T ) for item u,i DCG value for the best possible ranking, obtained by order- i ratedbyuser u and rank is the percentile rank of item i u,i ing the items by true ratings in descending order. Normalized within the ordered list of recommendations for user u.MPRis discounted cumulative gain for user u is then calculated as: then the arithmetic mean of the individual PR values over all users. A randomly ordered recommendation list has an DCG u expected MPR value of 50%. A smaller MPR value is there- NDCG = . (7) IDCG fore assumed to correspond to a superior recommendation performance. Finally, the overall normalized discounted cumulative gain Spread is a metric of how well the recommender algorithm NDCG is computed by averaging NDCG over the entire can spread its attention across a larger set of items [104]. In set of users. more detail, spread is the entropy of the distribution of the items recommended to the users in the test set. It is formally In the following, we present common quantitative eval- defined as: uation metrics, which have been particularly designed or adopted to assess recommender systems performance, even though some of them have their origin in information retrieval spread =− P(i ) log P(i ) (10) and machine learning. The first two (HLU and MRR) still i ∈I belong to the category of accuracy-related measures, while the subsequent ones capture beyond-accuracy aspects where I represents the entirety of items in the dataset Half-life utility (HLU) measures the utility of a recommenda- and P(i ) = count (i )/  count (i ), such that count (i ) i ∈I tion list for a user with the assumption that the likelihood of denotes the total number of times that a given item i showed viewing/choosing a recommended item by the user exponen- up in the recommendation lists. It may be infeasible to expect tially decays with the item’s position in the ranking [24,137]. an algorithm to achieve the perfect spread (i.e., recommend- Formally written, HLU for user u is defined as: ing each item an equal number of times) without avoiding 123 International Journal of Multimedia Information Retrieval (2018) 7:95–116 105 irrelevant recommendations or unfulfillable rating requests. of them, nor to be able to understand or rate them. Therefore, Accordingly, moderate spread values are usually preferable. moderate values indicate better performances [104]. Coverage of a recommender system is defined as the propor- Serendipity aims at evaluating MRS based on the relevant and tion of items over which the system is capable of generating surprising recommendations. While the need for serendip- recommendations [81]: ity is commonly agreed upon [82], the question of how to measure the degree of serendipity for a recommendation list is controversial. This particularly holds for the question of |T | whether the factor of surprise implies that items must be coverage = (11) novel to the user [98]. On a general level, serendipity of a |T | recommendation list L provided to a user u can be defined as: where |T | is the size of the test set and |T | is the number of unexp usef ul L ∩ L ratings in T for which the system can predict a value. This is u u serendipity(L ) = (13) particularly important in cold start situations, when recom- |L | mender systems are not able to accurately predict the ratings unexp usef ul of new users or new items and hence obtain low coverage. where L and L denote subsets of L that contain, u u Recommender systems with lower coverage are therefore respectively, recommendations unexpected to and useful for limited in the number of items they can recommend. A sim- the user. The usefulness of an item is commonly assessed by ple remedy to improve low coverage is to implement some explicitly asking users or taking user ratings as proxy [98]. default recommendation strategy for an unknown user–item The unexpectedness of an item is typically quantified by entry. For example, we can consider the average rating of some measure of distance from expected items, i.e., items that users for an item as an estimate of its rating. This may come are similar to the items already rated by the user. In the context at the price of accuracy, and therefore, the trade-off between of MRS, Zhang et al. [193] propose an “unserendipity” mea- coverage and accuracy needs to be considered in the evalua- sure that is defined as the average similarity between the items tion process [7]. in the user’s listening history and the new recommendations. Similarity between two items in this case is calculated by Novelty measures the ability of a recommender system to an adapted cosine measure that integrates co-liking informa- recommend new items that the user did not know about tion, i.e., number of users who like both items. It is assumed before [1]. A recommendation list may be accurate, but if that lower values correspond to more surprising recommen- it contains a lot of items that are not novel to a user, it is not dations, since lower values indicate that recommendations necessarily a useful list [193]. deviate from the user’s traditional behavior [193]. While novelty should be defined on an individual user level, considering the actual freshness of the recommended Diversity is another beyond-accuracy measure as already dis- items, it is common to use the self-information of the recom- cussed in the limitations part of Challenge 1. It gauges the mended items relative to their global popularity: extent to which recommended items are different from each other, where difference can relate to various aspects, e.g., musical style, artist, lyrics, or instrumentation, just to name 1 − log pop a few. Similar to serendipity, diversity can be defined in sev- novelty = (12) |U | N u∈Ui ∈L eral ways. One of the most common is to compute pairwise distance between all items in the recommendation set, either averaged [196]orsummed[173]. In the former case, the where pop is the popularity of item i measured as percent- diversity of a recommendation list L is calculated as follows: age of users who rated i, L is the recommendation list of the top N recommendations for user u [193,195]. The above dist i , j i ∈L j ∈L\i di versity(L) = (14) definition assumes that the likelihood of the user selecting a |L|· (|L|− 1) previously unknown item is proportional to its global popu- where dist is the some distance function defined between larity and is used as an approximation of novelty. In order to i , j items i and j. Common choices are inverse cosine similar- obtain more accurate information about novelty or freshness, ity [150], inverse Pearson correlation [183], or Hamming explicit user feedback is needed, in particular since the user distance [101]. might have listened to an item through other channels before. It is often assumed that the users prefer recommendation lists with more novel items. However, if the presented items When it comes to the task of evaluating playlist recom- are too novel, then the user is unlikely to have any knowledge mendation, where the goal is to assess the capability of the 123 106 International Journal of Multimedia Information Retrieval (2018) 7:95–116 recommender in providing proper transitions between subse- Addressing both objective and subjective evaluation cri- quent songs, the conventional error or accuracy metrics may teria, Knijnenburg et al. [108] propose a holistic frame- not be able to capture this property. There is hence a need for work for user-centric evaluation of recommender systems. sequence-aware evaluation measures. For example, consider Figure 1 provides an overview of the components. The objec- the scenario where a user who likes both classical and rock tive system aspects (OSAs) are considered unbiased factors music is recommended a rock music right after she has lis- of the RS, including aspects of the user interface, comput- tened to a classic piece. Even though both music styles are in ing time of the algorithm, or number of items shown to the agreement with her taste, the transition between songs plays user. They are typically easy to specify or compute. The an important role toward user satisfaction. In such a situation, OSAs influence the subjective system aspects (SSAs), which given a currently played song and in the presence of several are caused by momentary, primary evaluative feelings while equally likely good options to be played next, a RS may interacting with the system [80]. This results in a different be inclined to rank songs based on their popularity. Hence, perception of the system by different users. SSAs are there- other metrics such as average log-likelihood have been pro- fore highly individual aspects and typically assessed by user posed to better model the transitions [33,34]. In this regard, questionnaires. Examples of SSA include general appeal of when the goal is to suggest a sequence of items, alternative the system, usability, and perceived recommendation diver- multi-metric evaluation approaches are required to take into sity or novelty. The aspect of experience (EXP) describes consideration multiple quality factors. Such evaluation met- the user’s attitude toward the system and is commonly also rics can consider the ranking order of the recommendations investigated by questionnaires. It addresses the user’s per- or the internal coherence or diversity of the recommended ception of the interaction with the system. The experience is list as a whole. In many scenarios, adoption of such quality highly influenced by the other components, which means metrics can lead to a trade-off with accuracy which should changing any of the other components likely results in a be balanced by the RS algorithm [145]. change of EXP aspects. Experience can be broken down into the evaluation of the system, the decision process, and Limitations As of today, the vast majority of evaluation the final decisions made, i.e., the outcome. The interaction approaches in recommender systems research focus on quan- (INT) aspects describe the observable behavior of the user, titative measures, either accuracy-like or beyond-accuracy, time spent viewing an item, as well as clicking or purchas- which are often computed in offline studies. ing behavior. In a music context, examples further include Doing so has the advantage of facilitating the reproducibil- liking a song or adding it to a playlist. Therefore, interac- ity of evaluation results. However, limiting the evaluation to tions aspects belong to the objective measures and are usually quantitative measures means to forgo another important fac- determined via logging by the system. Finally, Knijnenburg tor, which is user experience. In other words, in the absence et al.’s framework mentions personal characteristics (PC) of user-centric evaluations, it is difficult to extend the claims and situational characteristics (SC), which influence the user to the more important objective of the recommender system experience. PC include aspects that do not exist without the under evaluation, i.e., giving users a pleasant and useful per- user, such as user demographics, knowledge, or perceived sonalized experience [107]. control, while SC include aspects of the interaction context, Despite acknowledging the need for more user-centric such as when and where the system is used, or situation- evaluation strategies [158], the factor human, user, or, in specific trust or privacy concerns. Knijnenburg et al. [108] the case of MRS, listener is still way too often neglected also propose a questionnaire to asses the factors defined in or not properly addressed. For instance, while there exist their framework, for instance, perceived recommendation quantitative objective measures for serendipity and diver- quality, perceived system effectiveness, perceived recom- sity, as discussed above, perceived serendipity and diversity mendation variety, choice satisfaction, intention to provide can be highly different from the measured ones [182]as feedback, general trust in technology, and system-specific they are subjective user-specific concepts. This illustrates privacy concern. that even beyond-accuracy measures cannot fully capture the While this framework is a generic one, tailoring it to MRS real user satisfaction with a recommender system. On the would allow for user-centric evaluation thereof. In partic- other hand, approaches that address user experience (UX) ular, the aspects of personal and situational characteristics can be investigated to evaluate recommender systems. For should be adapted to the particularities of music listeners example, a MRS can be evaluated based on user engage- and listening situations, respectively, cf. Sect. 2.1.Tothis ment, which provides a restricted explanation of UX that end, researchers in MRS should consider the aspects relevant concentrates on judgment of product quality during inter- to the perception and preference of music, and their implica- action [79,118,133]. User satisfaction, user engagement, tions on MRS, which have been identified in several studies, and more generally user experience are commonly assessed e.g., [43,113,114,158,159]. In addition to the general ones through user studies [13,116,117]. mentioned by Knijnenburg et al., of great importance in the 123 International Journal of Multimedia Information Retrieval (2018) 7:95–116 107 music domain seem to be psychological factors, including has been shown that personality can influence the human affect and personality, social influence, musical training and decision-making process as well as the tastes and interests. experience, and physiological condition. Due to this direct relation, people with similar personality We believe that carefully and holistically evaluating MRS factors are very likely to share similar interests and tastes. by means of accuracy and beyond-accuracy, objective and Earlier studies conducted on the user personality char- subjective measures, in offline and online experiments, would acteristics support the potential benefits that personality lead to a better understanding of the listeners’ needs and information could have in recommender systems [22,23,58, requirements vis-à-vis MRS, and eventually a considerable 85,87,178,180]. As a known example, psychological stud- improvement of current MRS. ies [147] have shown that extravert people are likely to prefer the upbeat and conventional music. Accordingly, a personality-based MRS could use this information to bet- 3 Future directions and visions ter predict which songs are more likely than others to please extravert people [86]. Another example of poten- While the challenges identified in the previous section are tial usage is to exploit personality information in order to already researched on intensely, in the following, we provide compute similarity among users and hence identify the like- a more forward-looking analysis and discuss some MRS- minded users [178]. This similarity information could then be related trending topics, which we assume influential for the integrated into a neighborhood-based collaborative filtering next generation of MRS. All of them have in common that approach. their aim is to create more personalized recommendations. In order to use personality information in a recommender More precisely, we first outline how psychological constructs system, the system first has to elicit this information from such as personality and emotion could be integrated into the users, which can be done either explicitly or implicitly. MRS. Subsequently, we address situation-aware MRS and In the former case, the system can ask the user to com- argue for the need of multifaceted user models that describe plete a personality questionnaire using one of the personality contextual and situational preferences. To round off, we evaluation inventories, e.g., the ten- item personality inven- discuss the influence of users’ cultural background on recom- tory [76] or the big five inventory [94]. In the latter case, mendation preferences, which needs to be considered when the system can learn the personality by tracking and observ- building culture-aware MRS. ing users’ behavioral patterns, for instance, liking behavior on Facebook [111] or applying filters to images posted on Instagram [170]. Not too surprisingly, it has shown that sys- 3.1 Psychologically inspired music recommendation tems that explicitly elicit personality characteristics achieve Personality and emotion are important psychological con- superior recommendation outcomes, e.g., in terms of user sat- structs. While personality characteristics of humans are a isfaction, ease of use, and prediction accuracy [52]. On the predictable and stable measure that shapes human behaviors, downside, however, many users are not willing to fill in long emotions are short-term affective responses to a particular questionnaires before being able to use the RS. A way to alle- stimulus [179]. Both have been shown to influence music viate this problem is to ask users only the most informative tastes [71,154,159] and user requirements for MRS [69,73]. questions of a personality instrument [163]. Which questions However, in the context of (music) recommender systems, are most informative, though, first needs to be determined based on existing user data and is dependent on the recom- personality and emotion do not play a major role yet. Given the strong evidence that both influence listening prefer- mendation domain. Other studies showed that users are to some extent willing to provide further information in return ences [147,159] and the recent emergence of approaches to accurately predict them from user-generated data [111,170], for a better quality of recommendations [175]. we believe that psychologically inspired MRS is an upcom- Personality information can be used in various ways, ing area. particularly, to generate recommendations when traditional rating or consumption data is missing. Otherwise, the person- 3.1.1 Personality ality traits can be seen as an additional feature that extends the user profile, that can be used mainly to identify similar users In psychology research, personality is often defined as a “con- in neighborhood-based recommender systems or directly fed sistent behavior pattern and interpersonal processes originat- into extended matrix factorization models [67]. ing within the individual” [25]. This definition accounts for the individual differences in people’s emotional, interper- 3.1.2 Emotion sonal, experiential, attitudinal, and motivational styles [95]. Several prior works have studied the relation of decision The emotional state of the MRS user has a strong impact on making and personality factors. In [147], as an example, it his or her short-time musical preferences [99]. Vice versa, 123 108 International Journal of Multimedia Information Retrieval (2018) 7:95–116 Fig. 1 Evaluation framework of the user experience for recommender systems, according to [108] music has a strong influence on our emotional state. It there- presented one of the various categorical models (emotions fore does not come as a surprise that emotion regulation are described by distinct emotion words such as happiness, was identified as one of the main reasons why people lis- sadness, anger, or fear) [84,191]or dimensional models tentomusic [122,155]. As an example, people may listen to (emotions are described by scores with respect to two or completely different musical genres or styles when they are three dimensions, e.g., valence and arousal) [152]. For a sad in comparison with when they are happy. Indeed, prior more detailed elaboration on emotion models in the context research on music psychology discovered that people may of music, we refer to [159,186]. The implicit acquisition of choose the type of music which moderates their emotional emotional states can be effected, for instance, by analyzing condition [109]. More recent findings show that music can be user-generated text [49], speech [66], or facial expressions mainly chosen so as to augment the emotional situation per- in video [55]. ceived by the listener [131]. In order to build emotion-aware Emotion tagging in music The music piece itself can be MRS, it is therefore necessary to (i) infer the emotional state regarded as an emotion-laden content and in turn can be the listener is in, (ii) infer emotional concepts from the music described by emotion words. The task of automatically itself, and (iii) understand how these two interrelate. These assigning such emotion words to a music piece is an active three tasks are detailed below. research area, often refereed to as music emotion recogni- Eliciting the emotional state of the listener Similar to per- tion (MER), e.g., [14,91,103,187,188,191]. How to integrate sonality traits, the emotional state of a user can be elicited such emotion terms created by MER tools into a MRS is, explicitly or implicitly. In the former case, the user is typically however, not an easy task, for several reasons. First, early 123 International Journal of Multimedia Information Retrieval (2018) 7:95–116 109 MER approaches usually neglected the distinction between for instance, the music preference of a user would differ in intended emotion, perceived emotion, and induced or felt libraries and in gyms [35]. Therefore, considering location as emotion, cf. Sect. 2.1. Current MER approaches focus on a situation-specific signal could lead to substantial improve- perceived or induced emotions. However, musical content ments in the recommendation performance. Time of the day still contains various characteristics that affect the emotional is another situational signal that could be used for recom- state of the listener, such as lyrics, rhythm, and harmony, and mendation; for instance, the music a user would like to listen the way how they affect the emotional state is highly subjec- to in mornings differs from those in nights [41]. One situa- tive. This so even though research has detected a few general tional signal of particular importance in the music domain is rules, for instance, a musical piece that is in major key is social context since music tastes and consumption behaviors typically perceived brighter and happier than those in minor are deeply rooted in the users’ social identities and mutually key, or a piece in rapid tempo is perceived more exciting or affect each other [45,134]. For instance, it is very likely that more tense than slow tempo ones [112]. a user would prefer different music when being alone than when meeting friends. Such social factors should therefore Connecting listener emotions and music emotion tags Cur- be considered when building situation-aware MRS. Other rent emotion-based MRSs typically consider emotional situational signals that are sometimes exploited include the scores as contextual factors that characterize the situation user’s current activity [184], the weather [140], the user’s the user is experiencing. Hence, the recommender systems mood [129], and the day of the week [83]. Regarding time, exploit emotions in order to pre-filter the preferences of users there is also another factor to consider, which is that most or post-filter the generated recommendations. Unfortunately, music that was considered trendy years ago is now consid- this neglects the psychological background, in particular ered old. This implies that ratings for the same song or artist on the subjective and complex interrelationships between might strongly differ, not only between users, but in general expressed, perceived, and induced emotions [159], which as a function of time. To incorporate such aspects in MRS, it is of special importance in the music domain as music is would be crucial to record a timestamp for all ratings. known to evoke stronger emotions than, for instance, prod- It is worth noting that situational features have been ucts [161]. It has also been shown that personality influences proven to be strong signals in improving retrieval perfor- in which emotional state which kind of emotionally laden mance in search engines [16,190]. Therefore, we believe music is preferred by listeners [71]. Therefore, even if auto- that researching and building situation-aware music recom- mated MER approaches would be able to accurately predict mender systems should be one central topic in MRS research. the perceived or induced emotion of a given music piece, in While several situation-aware MRSs already exist, e.g., the absence of deep psychological listener profiles, match- [12,35,90,100,157,184], they commonly exploit only one or ing emotion annotations of items and listeners may not yield very few such situational signals, or are restricted to a cer- satisfying recommendations. This is so because how people tain usage context, e.g., music consumption in a car or in a judge music and which kind of music they prefer depends tourist scenario. Those systems that try to take a more com- to a large extent on their current psychological and cogni- prehensive view and consider a variety of different signals, tive states. We hence believe that the field of MRS should on the other hand, suffer from a low number of data instances embrace psychological theories, elicit the respective user- or users, rendering it very hard to build accurate context specific traits, and integrate them into recommender systems, models [75]. What is still missing, in our opinion, are (com- in order to build decent emotion-aware MRS. mercial) systems that integrate a variety of situational signals on a very large scale in order to truly understand the listen- 3.2 Situation-aware music recommendation ers needs and intents in any given situation and recommend music accordingly. While we are aware that data availability Most of the existing music recommender systems make and privacy concerns counteract the realization of such sys- recommendations solely based on a set of user-specific tems on a large commercial scale, we believe that MRS will and item-specific signals. However, in real-world scenarios, eventually integrate decent multifaceted user models inferred many other signals are available. These additional signals from contextual and situational factors. can be further used to improve the recommendation perfor- mance. A large subset of these additional signals includes 3.3 Culture-aware music recommendation situational signals. In more detail, the music preference of a user depends on the situation at the moment of recom- While most humans share an inclination to listen to music, mendation. Location is an example of situational signals; independent on their location or cultural background, the way music is performed, perceived, and interpreted evolves in a culture-specific manner. However, research in MRS seems Please note that music taste is a relatively stable characteristic, while music preferences vary depending on the context and listening intent. to be agnostic of this fact. In music information retrieval 123 110 International Journal of Multimedia Information Retrieval (2018) 7:95–116 (MIR) research, on the other hand, cultural aspects have should be defined on various levels though, not only coun- been studied to some extent in recent years, after preceding try borders. Other examples include having a joint historical (and still ongoing) criticisms of the predominance of Western background, speaking the same language, sharing the same music in this community. Arguably the most comprehensive beliefs or religion, and differences between urban vs. rural culture-specific research in this domain has been conducted cultures. Another aspect that relates to culture is a temporal as part of the CompMusic project, in which five non- one since certain cultural trends, e.g., what defines the “youth Western music traditions have been analyzed in detail in order culture,” are highly dynamic in a temporal and geographical to advance automatic description of music by emphasizing sense. We believe that MRS which are aware of such cross- cultural specificity. The analyzed music traditions included cultural differences and similarities in music perception and Indian Hindustani and Carnatic [53], Turkish Makam [54], taste, and are able to recommend music a listener in the same Arab-Andalusian [174], and Beijing Opera [148]. However, or another culture may like, would substantially benefit both the project’s focus was on music creation, content analy- users and providers of MRS. sis, and ethnomusicological aspects rather than on the music consumption side [37,165,166]. Recently, analyzing content- based audio features describing rhythm, timbre, harmony, and melody for a corpus of a larger variety of world and folk 4 Conclusions music with given country information, Panteli et al. found distinct acoustic patterns of the music created in individual In this trends and survey paper, we identified several grand countries [138]. They also identified geographical and cul- challenges the research field of music recommender systems tural proximities that are reflected in music features, looking (MRS) is facing. These are, among others, in the focus of at outliers and misclassifications in a classification experi- current research in the area of MRS. We discussed (1) the ments using country as target class. For instance, Vietnamese cold start problem of items and users, with its particularities music was often confused with Chinese and Japanese, South in the music domain, (2) the challenge of automatic playlist African with Botswanese. continuation, which is gaining importance due to the recently In contrast to this—meanwhile quite extensive—work on emerged user request of being recommended musical expe- culture-specific analysis of music traditions, little effort has riences rather than single tracks [161], and (3) the challenge been made to analyze cultural differences and patterns of of holistically evaluating music recommender systems, in music consumption behavior, which is, as we believe, a cru- particular, capturing aspects beyond accuracy. cial step to build culture-aware MRS. The few studies investi- In addition to the grand challenges, which are currently gating such cultural differences include [88], in which Hu and highly researched, we also presented a visionary outlook of Lee found differences in perception of moods between Amer- what we believe to be the most interesting future research ican and Chinese listeners. By analyzing the music listening directions in MRS. In particular, we discussed (1) psycholog- behavior of users from 49 countries, Ferwerda et al. found ically inspired MRS, which consider in the recommendation relationships between music listening diversity and Hofst- process factors such as listeners’ emotion and personality, ede’s cultural dimensions [70,72]. Skowron et al. used the (2) situation-aware MRS, which holistically model contex- same dimensions to predict genre preferences of listeners tual and environmental aspects of the music consumption with different cultural backgrounds [171]. Schedl analyzed a process, infer listener needs and intents, and eventually inte- large corpus of listening histories created by Last.fm users in grate these models at large scale in the recommendation 47 countries and identified distinct preference patterns [156]. process, and (3) culture-aware MRS, which exploit the fact Further analyses revealed countries closest to what can be that music taste highly depends on the cultural background of considered the global mainstream (e.g., the Netherlands, UK, the listener, where culture can be defined in manifold ways, and Belgium) and countries farthest from it (e.g., China, Iran, including historical, political, linguistic, or religious similar- and Slovakia). However, all of these works define culture in ities. terms of country borders, which often makes sense, but is We hope that this article helped pinpointing major chal- sometimes also problematic, for instance, in countries with lenges, highlighting recent trends, and identifying interest- large minorities of inhabitants with different cultures. ing research questions in the area of music recommender In our opinion, when building MRS, the analysis of cul- systems. Believing that research addressing the discussed tural patterns of music consumption behavior, subsequent challenges and trends will pave the way for the next genera- creation of respective cultural listener models, and their inte- tion of music recommender systems, we are looking forward gration into recommender systems are vital steps to improve to exciting, innovative approaches and systems that improve personalization and serendipity of recommendations. Culture user satisfaction and experience, rather than just accuracy measures. http://compmusic.upf.edu. 123 International Journal of Multimedia Information Retrieval (2018) 7:95–116 111 Acknowledgements Open access funding provided by Johannes Kepler 14. Barthet M, Fazekas G, Sandler M (2012) Multidisciplinary per- University Linz. We would like to thank all researchers in the fields of spectives on music emotion recognition: Implications for content recommender systems, information retrieval, music research, and mul- and context-based models. In: Proceedings of international sym- timedia, with whom we had the pleasure to discuss and collaborate posium on computer music modelling and retrieval, pp 492–507 in recent years, and whom in turn influenced and helped shaping this 15. Bauer C, Novotny A (2017) A consolidated view of context for article. Special thanks go to Peter Knees and Fabien Gouyon for the intelligent systems. J Ambient Intell Smart Environ 9(4):377–393. fruitful discussions while preparing the ACM Recommender Systems https://doi.org/10.3233/ais-170445 2017 tutorial on music recommender systems. In addition, we would like 16. Bennett PN, Radlinski F, White RW, Yilmaz E (2011) Inferring to thank the reviewers of our manuscript, who provided useful and con- and using location metadata to personalize web search. In: Pro- structive comments to improve the original draft and turn it into what it ceedings of the 34th international ACM SIGIR conference on is now. We would also like to thank Eelco Wiechert for providing addi- research and development in information retrieval, SIGIR’11. tional pointers to relevant literature. Furthermore, the many personal ACM, New York, NY, USA, pp 135–144. https://doi.org/10.1145/ discussions with actual users of MRS unveiled important shortcomings 2009916.2009938 of current approaches and in turn were considered in this article. 17. Bodner E, Iancu I, Gilboa A, Sarel A, Mazor A, Amir D (2007) Finding words for emotions: the reactions of patients with major Open Access This article is distributed under the terms of the Creative depressive disorder towards various musical excerpts. Arts Psy- Commons Attribution 4.0 International License (http://creativecomm chother 34(2):142–150 ons.org/licenses/by/4.0/), which permits unrestricted use, distribution, 18. Boer D, Fischer R (2010) Towards a holistic model of functions of and reproduction in any medium, provided you give appropriate credit music listening across cultures: a culturally decentred qualitative to the original author(s) and the source, provide a link to the Creative approach. Psychol Music 40(2):179–200 Commons license, and indicate if changes were made. 19. Bogdanov D, Haro M, Fuhrmann F, Xambó A, Gómez E, Herrera P (2013) Semantic audio content-based music recommendation and visualization based on user preference examples. Inf Process Manag 49(1):13–33 References 20. Bollen D, Knijnenburg BP, Willemsen MC, Graus M (2010) Understanding choice overload in recommender systems. In: Pro- 1. Adamopoulos P, Tuzhilin A (2015) On unexpectedness in recom- ceedings of the 4th ACM conference on recommender systems, mender systems: or how to better expect the unexpected. ACM Barcelona, Spain Trans Intell Syst Technol 5(4):54 21. Bonnin G, Jannach D (2015) Automated generation of music 2. Adomavicius G, Mobasher B, Ricci F, Tuzhilin A (2011) Context- playlists: survey and experiments. ACM Comput Surv 47(2):26 aware recommender systems. AI Mag 32:67–80 22. Braunhofer M, Elahi M, Ricci F (2014) Techniques for cold- 3. Adomavicius G, Tuzhilin A (2005) Toward the next generation of starting context-aware mobile recommender systems for tourism. Intelli Artif 8(2):129–143. https://doi.org/10.3233/IA-140069 recommender systems: a survey of the state-of-the-art and pos- 23. Braunhofer M, Elahi M, Ricci F (2015) User personality and the sible extensions. IEEE Trans Knowl Data Eng 17(6):734–749. new user problem in a context-aware point of interest recom- https://doi.org/10.1109/TKDE.2005.99 mender system. In: Tussyadiah I, Inversini A (eds) Information 4. Agarwal D, Chen BC (2009) Regression-based latent factor mod- and communication technologies in tourism 2015. Springer, els. In: Proceedings of the 15th ACM SIGKDD international Cham, pp 537–549 conference on knowledge discovery and data mining. ACM, pp 19–28 24. Breese JS, Heckerman D, Kadie C (1998) Empirical analysis of 5. Aggarwal CC (2016) Content-based recommender systems. In: predictive algorithms for collaborative filtering. In: Proceedings Recommender systems. Springer, pp 139–166 of the 14th conference on uncertainty in artificial intelligence. 6. Aggarwal CC (2016) Ensemble-based and hybrid recommender Morgan Kaufmann Publishers Inc., pp 43–52 systems. In: Recommender systems. Springer, pp 199–224 25. Burger JM (2010) Personality. Wadsworth Publishing, Belmont 7. Aggarwal CC (2016) Evaluating recommender systems. In: Rec- 26. Burke R (2002) Hybrid recommender systems: survey and exper- ommender systems. Springer, pp 225–254 iments. User Model User-Adap Interact 12(4):331–370 8. Aiolli F (2013) Efficient top-n recommendation for very large 27. Burke R (2007) Hybrid web recommender systems. Springer scale binary rated datasets. In: Proceedings of the 7th ACM con- Berlin Heidelberg, Berlin, pp 377–408. https://doi.org/10.1007/ ference on recommender systems. ACM, pp. 273–280 978-3-540-72079-9_12 9. Alghoniemy M, Tewfik A (2001) A network flow model for 28. Cantador I, Cremonesi P (2014) Tutorial on cross-domain recom- playlist generation. In: Proceedings of the IEEE international con- mender systems. In: Proceedings of the 8th ACM conference on ference on multimedia and expo (ICME), Tokyo, Japan recommender systems, RecSys’14. ACM, New York, NY, USA, 10. Alghoniemy M, Tewfik AH (2000) User-defined music sequence pp 401–402. https://doi.org/10.1145/2645710.2645777 retrieval. In: Proceedings of the eighth ACM international confer- 29. Cantador I, Fernández-Tobías I, Berkovsky S, Cremonesi P (2015) ence on multimedia, pp 356–358. ACM Cross-domain recommender systems. Springer, Boston, pp 919– 11. Baeza-Yates R, Ribeiro-Neto B (2011) Modern information 959. https://doi.org/10.1007/978-1-4899-7637-6_27 retrieval—the concepts and technology behind search, 2nd edn. 30. Carenini G, Smith J, Poole D (2003) Towards more conversational Addison-Wesley, Pearson and collaborative recommender systems. In: Proceedings of the 12. Baltrunas L, Kaminskas M, Ludwig B, Moling O, Ricci F, Lüke 8th international conference on intelligent user interfaces, IUI’03. KH, Schwaiger R (2011) InCarMusic: Context-Aware Music Rec- ACM, New York, NY, USA, pp. 12–18. https://doi.org/10.1145/ 604045.604052 ommendations in a Car. In: International conference on electronic 31. Cebrián T, Planagumà M, Villegas P, Amatriain X (2010) Music commerce and web technologies (EC-Web), Toulouse, France recommendations with temporal context awareness. In: Pro- 13. Barrington L, Oda R, Lanckriet GRG. Smarter than genius? ceedings of the 4th ACM conference on recommender systems Human evaluation of music recommender systems. In: Proceed- (RecSys), Barcelona, Spain ings of the 10th international society for music information retrieval conference, ISMIR 2009, Kobe International Conference 32. Chen S, Moore JL, Turnbull D, Joachims T (2012) Playlist pre- Center, Kobe, Japan, 26–30 October 2009, pp 357–362 diction via metric embedding. In: Proceedings of the 18th ACM 123 112 International Journal of Multimedia Information Retrieval (2018) 7:95–116 SIGKDD international conference on knowledge discovery and 49. Dey L, Asad MU, Afroz N, Nath RPD (2014) Emotion extraction data mining, KDD’12. ACM, New York, NY, USA, pp 714–722. from real time chat messenger. In: 2014 International conference https://doi.org/10.1145/2339530.2339643 on informatics, electronics vision (ICIEV), pp 1–5. https://doi. 33. Chen S, Moore JL, Turnbull D, Joachims T (2012) Playlist pre- org/10.1109/ICIEV.2014.6850785 diction via metric embedding. In: Proceedings of the 18th ACM 50. Donaldson J (2007) A hybrid social-acoustic recommendation SIGKDD international conference on knowledge discovery and system for popular music. In: Proceedings of the ACM conference data mining. ACM, pp 714–722 on recommender systems (RecSys), Minneapolis, MN, USA 34. Chen S, Xu J, Joachims T (2013) Multi-space probabilistic 51. Dror G, Koenigstein N, Koren Y, Weimer M (2011) The yahoo! sequence modeling. In: Proceedings of the 19th ACM SIGKDD music dataset and kdd-cup’11. In: Proceedings of the 2011 international conference on knowledge discovery and data min- international conference on KDD Cup 2011, vol 18, pp 3–18. ing. ACM, pp 865–873 JMLR.org 35. Cheng Z, Shen J (2014) Just-for-me: an adaptive personalization 52. Dunn G, Wiersema J, Ham J, Aroyo L (2009) Evaluating interface system for location-aware social music recommendation. In: Pro- variants on personality acquisition for recommender systems. In: ceedings of the 4th ACM international conference on multimedia Proceedings of the 17th international conference on user mod- retrieval (ICMR), Glasgow, UK eling, adaptation, and Personalization: formerly UM and AH, 36. Cheng Z, Shen J (2016) On effective location-aware music rec- UMAP’09. Springer, Berlin, Heidelberg, pp 259–270 ommendation. ACM Trans Inf Syst 34(2):13 53. Dutta S, Murthy HA (2014) Discovering typical motifs of a raga 37. Cornelis O, Six J, Holzapfel A, Leman M (2013) Evaluation from one-liners of songs in carnatic music. In: Proceedings of the and recommendation of pulse and tempo annotation in ethnic 15th international society for music information retrieval confer- music. J New Music Res 42(2):131–149. https://doi.org/10.1080/ ence (ISMIR), Taipei, Taiwan, pp 397–402 09298215.2013.812123 54. Dzhambazov G, Srinivasamurthy A, Sentürk ¸ S, Serra X (2016) 38. Cremonesi P, Elahi M, Garzotto F (2017) User interface patterns in On the use of note onsets for improved lyrics-to-audio alignment recommendation-empowered content intensive multimedia appli- in turkish makam music. In: 17th International society for music cations. Multimed Tools Appl 76(4):5275–5309. https://doi.org/ information retrieval conference (ISMIR 2016), New York, USA 10.1007/s11042-016-3946-5 55. Ebrahimi Kahou S, Michalski V, Konda K, Memisevic R, Pal 39. Cremonesi P, Quadrana M (2014) Cross-domain recommenda- C (2015) Recurrent neural networks for emotion recognition in tions without overlapping data: Myth or reality? In: Proceedings video. In: Proceedings of the 2015 ACM on international confer- of the 8th ACM conference on recommender systems, RecSys’14. ence on multimodal interaction, ICMI’15. ACM, New York, NY, ACM, New York, NY, USA, pp. 297–300. https://doi.org/10. USA, pp 467–474. https://doi.org/10.1145/2818346.2830596 1145/2645710.2645769 56. Eghbal-zadeh H, Lehner B, Schedl M, Widmer G (2015) I-Vectors 40. Cremonesi P, Tripodi A, Turrin R (2011) Cross-domain rec- for timbre-based music similarity and music artist classification. ommender systems. In: IEEE 11th international conference on In: Proceedings of the 16th international society for music infor- data mining workshops, pp 496–503. https://doi.org/10.1109/ mation retrieval conference (ISMIR), Malaga, Spain ICDMW.2011.57 57. Elahi M (2011) Adaptive active learning in recommender systems. 41. Cunningham S, Caulder S, Grout V (2008) Saturday night or User Model Adapt Pers 414–417 fever? Context-aware music playlists. In: Proceedings of the 3rd 58. Elahi M, Braunhofer M, Ricci F, Tkalcic M (2013) Personality- international audio mostly conference: sound in motion, Piteå, based active learning for collaborative filtering recommender sys- Sweden tems. In: AI* IA 2013: advances in artificial intelligence. Springer, 42. Cunningham SJ, Bainbridge D, Falconer A (2006) ‘More of an art pp 360–371. https://doi.org/10.1007/978-3-319-03524-6_31 than a science’: supporting the creation of playlists and mixes. In: 59. Elahi M, Deldjoo Y, Bakhshandegan Moghaddam F, Cella L, Proceedings of the 7th international conference on music infor- Cereda S, Cremonesi P (2017) Exploring the semantic gap for mation retrieval (ISMIR), Victoria, BC, Canada movie recommendations. In: Proceedings of the eleventh ACM 43. Cunningham SJ, Bainbridge D, Mckay D (2007) Finding new conference on recommender systems. ACM, pp 326–330 music: a diary study of everyday encounters with novel songs. In: 60. Elahi M, Repsys V, Ricci F (2011) Rating elicitation strategies for Proceedings of the 8th international conference on music infor- collaborative filtering. In: Huemer C, Setzer T (eds) EC-Web, Lec- mation retrieval, Vienna, Austria, pp 83–88 ture Notes in Business Information Processing, vol 85. Springer, 44. Cunningham SJ, Downie JS, Bainbridge D (2005) “The Pain, The pp 160–171. https://doi.org/10.1007/978-3-642-23014-1_14 Pain”: modelling music information behavior and the songs we 61. Elahi M, Ricci F, Rubens N (2012) Adapting to natural rat- hate. In: Proceedings of the 6th international conference on music ing acquisition with combined active learning strategies. In: information retrieval (ISMIR 2005), London, UK, pp 474–477 ISMIS’12: Proceedings of the 20th international conference on 45. Cunningham SJ, Nichols DM (2009) Exploring social music foundations of intelligent systems. Springer, Berlin, Heidelberg, behaviour: an investigation of music selection at parties. In: Pro- pp 254–263 ceedings of the 10th international society for music information 62. Elahi M, Ricci F, Rubens N (2014) Active learning in collabora- retrieval conference (ISMIR 2009), Kobe, Japan tive filtering recommender systems. In: Hepp M, Hoffner Y (eds) 46. Deldjoo Y, Cremonesi P, Schedl M, Quadrana M (2017) The E-commerce and web technologies, Lecture Notes in Business effect of different video summarization models on the quality Information Processing, vol 188. Springer, pp 113–124. https:// of video recommendation based on low-level visual features. In: doi.org/10.1007/978-3-319-10491-1_12 Proceedings of the 15th international workshop on content-based 63. Elahi M, Ricci F, Rubens N (2014) Active learning strategies for multimedia indexing. ACM, p. 20 rating elicitation in collaborative filtering: a system-wide perspec- 47. Deldjoo Y, Elahi M, Cremonesi P, Garzotto F, Piazzolla P, Quad- tive. ACM Trans Intell Syst Technol 5(1):13:1–13:33. https://doi. rana M (2016) Content-based video recommendation system org/10.1145/2542182.2542195 based on stylistic visual features. J Data Semant. https://doi.org/ 64. Elahi M, Ricci F, Rubens N (2016) A survey of active learning 10.1007/s13740-016-0060-9 in collaborative filtering recommender systems. Comput Sci Rev 48. Dey AK (2001) Understanding and using context. Pers Ubiquitous 20:29–50 Comput 5(1):4–7. https://doi.org/10.1007/s007790170019 123 International Journal of Multimedia Information Retrieval (2018) 7:95–116 113 65. Elbadrawy A, Karypis G (2015) User-specific feature-based simi- ing. In: Proceedings of the ACM conference on recommender larity models for top-n recommendation of new items. ACM Trans systems: workshop on music recommendation and discovery Intell Syst Technol 6(3):33 (WOMRAD 2010), pp 7–10 66. Erdal M, Kächele M, Schwenker F (2016) Emotion recognition 84. Hevner K (1935) Expression in music: a discussion of experimen- in speech with deep learning architectures. Springer, Cham, pp tal studies and theories. Psychol Rev 42:186–204 298–311. https://doi.org/10.1007/978-3-319-46182-3_25 85. Hu R, Pu P (2009) A comparative user study on rating vs. person- 67. Fernandez Tobias I, Braunhofer M, Elahi M, Ricci F, Ivan C ality quiz based preference elicitation methods. In: Proceedings (2016) Alleviating the new user problem in collaborative filter- of the 14th international conference on Intelligent user interfaces, ing by exploiting personality information. User Model User-Adap IUI’09. ACM, New York, NY, USA, pp 367–372. https://doi.org/ Interact (Personality in Personalized Systems). https://doi.org/10. 10.1145/1502650.1502702 1007/s11257-016-9172-z 86. Hu R, Pu P (2010) A study on user perception of personality- 68. Fernández-Tobías I, Cantador I, Kaminskas M, Ricci F (2012) based recommender systems. In: Bra PD, Kobsa A, Chin DN (eds) Cross-domain recommender systems: a survey of the state of the UMAP, Lecture Notes in Computer Science, vol 6075. Springer, art. In: Spanish conference on information retrieval, p 24 pp 291–302 69. Ferwerda B, Graus M, Vall A, Tkalci ˇ cˇ M, Schedl M (2016) The 87. Hu R, Pu P (2011) Enhancing collaborative filtering systems with influence of users’ personality traits on satisfaction and attrac- personality information. In: Proceedings of the fifth ACM confer- tiveness of diversified recommendation lists. In: Proceedings of ence on recommender systems, RecSys’11. ACM, New York, NY, the 4th workshop on emotions and personality in personalized USA, pp 197–204. https://doi.org/10.1145/2043932.2043969 services (EMPIRE 2016), Boston, USA 88. Hu X, Lee JH (2012) A cross-cultural study of music mood per- 70. Ferwerda B, Schedl M (2016) Investigating the relationship ception between American and Chinese listeners. In: Proceedings between diversity in music consumption behavior and cultural of the ISMIR dimensions: a cross-country analysis. In: Workshop on surprise, 89. Hu Y, Koren Y, Volinsky C (2008) Collaborative filtering for opposition, and obstruction in adaptive and personalized systems implicit feedback datasets. In: Proceedings of the 8th IEEE inter- 71. Ferwerda B, Schedl M, Tkalci ˇ cˇ M (2015) Personality & emotional national conference on data mining. IEEE, pp. 263–272 states: understanding users music listening needs. In: Extended 90. Hu Y, Ogihara M (2011) NextOne player: a music recommenda- proceedings of the 23rd international conference on user model- tion system based on user behavior. In: Proceedings of the 12th ing, adaptation and personalization (UMAP), Dublin, Ireland international society for music information retrieval conference 72. Ferwerda B, Vall A, Tkalci ˇ cˇ M, Schedl M (2016) Exploring music (ISMIR 2011), Miami, FL, USA diversity needs across countries. In: Proceedings of the UMAP 91. Huq A, Bello J, Rowe R (2010) Automated music emotion recog- 73. Ferwerda B, Yang E, Schedl M, Tkalci ˇ cˇ M (2015) Personal- nition: a systematic evaluation. J New Music Res 39(3):227–244 ity traits predict music taxonomy preferences. In: ACM CHI’15 92. Iman Kamehkhosh Dietmar Jannach GB (2018) How automated extended abstracts on human factors in computing systems, Seoul, recommendations affect the playlist creation behavior of users. In: Republic of Korea Joint proceedings of the 23rd ACM conference on intelligent user 74. Flexer A, Schnitzer D, Gasser M, Widmer G (2008) Playlist gen- interfaces (ACM IUI 2018) workshops: intelligent music inter- eration using start and end songs. In: Proceedings of the 9th faces for listening and creation (MILC), Tokyo, Japan international conference on music information retrieval (ISMIR), 93. Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation Philadelphia, PA, USA of ir techniques. ACM Trans Inf Syst 20(4):422–446. https://doi. 75. Gillhofer M, Schedl M (2015) Iron maiden while jogging, debussy org/10.1145/582415.582418 for dinner? An analysis of music listening behavior in context. In: 94. John O, Srivastava S (1999) The big five trait taxonomy: history, Proceedings of the 21st international conference on multimedia measurement, and theoretical perspectives. In: Pervin LA, John modeling (MMM), Sydney, Australia OP (eds) Handbook of personality: theory and research, 510, 2nd 76. Gosling SD, Rentfrow PJ, Swann WB Jr (2003) A very brief edn. Guilford Press, New York, pp 102–138 measure of the big-five personality domains. J Res Personal 95. John OP, Srivastava S (1999) The big five trait taxonomy: his- 37(6):504–528 tory, measurement, and theoretical perspectives. In: Handbook of 77. Gross J (2007) Emotion regulation: conceptual and empirical personality: theory and research, vol 2, pp. 102–138 foundations. In: Gross J (ed) Handbook of emotion regulation, 96. Juslin PN, Sloboda J (2011) Handbook of music and emotion: 2nd edn. The Guilford Press, New York, pp 1–19 theory, research, applications. OUP, Oxford 78. Gunawardana A, Shani G (2015) Evaluating recommender sys- 97. Kaggle Official Homepage. https://www.kaggle.com. Accessed tems. In: Ricci F, Rokach L, Shapira B, Kantor PB (eds) 11 March 2018 Recommender systems handbook, chap. 8, 2nd edn. Springer, 98. Kaminskas M, Bridge D (2016) Diversity, serendipity, novelty, Heidelberg, pp 256–308 and coverage: a survey and empirical analysis of beyond-accuracy 79. Hart J, Sutcliffe AG, di Angeli A (2012) Evaluating user engage- objectives in recommender systems. ACM Trans Interact Intell ment theory. In: CHI conference on human factors in computing Syst 7(1):2:1–2:42. https://doi.org/10.1145/2926720 systems. Paper presented in workshop ’Theories behind UX 99. Kaminskas M, Ricci F (2012) Contextual music information Research and How They Are Used in Practice’ 6 May 2012 retrieval and recommendation: state of the art and challenges. 80. Hassenzahl M (2005) The thing and I: understanding the relation- Comput Sci Rev 6(2):89–119 ship between user and product. Springer, Dordrecht, pp 31–42. 100. Kaminskas M, Ricci F, Schedl M (2013) Location-aware music https://doi.org/10.1007/1-4020-2967-5_4 recommendation using auto-tagging and hybrid matching. In: Pro- 81. Herlocker JL, Konstan JA, Terveen LG, Riedl JT (2004) Evaluat- ceedings of the 7th ACM conference on recommender systems ing collaborative filtering recommender systems. ACM Trans Inf (RecSys), Hong Kong, China Syst 22(1):5–53. https://doi.org/10.1145/963770.963772 101. Kelly JP, Bridge D (2006) Enhancing the diversity of conversa- 82. Herlocker JL, Konstan JA, Terveen LG, Riedl JT (2004) Evaluat- tional collaborative recommendations: a comparison. Artif Intell ing collaborative filtering recommender systems. ACM Trans Inf Rev 25(1):79–95. https://doi.org/10.1007/s10462-007-9023-8 Syst 22(1):5–53 102. Khan MM, Ibrahim R, Ghani I (2017) Cross domain recom- 83. Herrera P, Resa Z, Sordo M (2010) Rocking around the clock eight mender systems: a systematic literature review. ACM Comput days a week: an exploration of temporal patterns of music listen- Surv 50(3):36 123 114 International Journal of Multimedia Information Retrieval (2018) 7:95–116 103. Kim YE, Schmidt EM, Migneco R, Morton BG, Richardson P, 121. Logan B (2002) Content-based playlist generation: exploratory Scott J, Speck J, Turnbull D (2010) Music emotion recognition: a experiments. In: Proceedings of the 3rd international symposium state of the art review. In: Proceedings of the international society on music information retrieval (ISMIR), Paris, France for music information retrieval conference 122. Lonsdale AJ, North AC (2011) Why do we listen to music? A 104. Kluver D, Konstan JA (2014) Evaluating recommender behavior uses and gratifications analysis. Br J Psychol 102(1):108–134 for new users. In: Proceedings of the 8th ACM conference on rec- 123. Maillet F, Eck D, Desjardins G, Lamere P et al (2009) Steerable ommender systems. ACM, pp 121–128. https://doi.org/10.1145/ playlist generation by learning song similarity from radio station 2645710.2645742 playlists. In: ISMIR, pp 345–350 105. Knees P, Pohle T, Schedl M, Widmer G (2006) Combining 124. McFee B, Bertin-Mahieux T, Ellis DP, Lanckriet GR (2012) The audio-based similarity with web-based data to accelerate auto- million song dataset challenge. In: Proceedings of the 21st inter- matic music playlist generation. In: Proceedings of the 8th national conference on world wide web. ACM, pp 909–916 ACM SIGMM international workshop on multimedia informa- 125. McFee B, Lanckriet G (2011) The natural language of playlists. tion retrieval (MIR), Santa Barbara, CA, USA In: Proceedings of the 12th international society for music infor- 106. Knees P, Schedl M (2016) Music similarity and retrieval: an mation retrieval conference (ISMIR 2011), Miami, FL, USA introduction to audio- and web-based strategies. The information 126. McFee B, Lanckriet G (2012) Hypergraph models of playlist retrieval series. Springer Berlin Heidelberg. https://books.google. dialects. In: Proceedings of the 13th international society for it/books?id=MdRhjwEACAAJ music information retrieval conference (ISMIR), Porto, Portugal 107. Knijnenburg BP, Willemsen MC (2015) Evaluating recommender 127. McNee SM, Lam SK, Konstan JA, Riedl J (2003) Interfaces for systems with user experiments. In: Recommender systems hand- eliciting new user preferences in recommender systems. In: Pro- book. Springer, pp 309–352 ceedings of the 9th international conference on user modeling, 108. Knijnenburg BP, Willemsen MC, Gantner Z, Soncu H, Newell C UM’03. Springer, Berlin, Heidelberg, pp. 178–187. http://dl.acm. (2012) Explaining the user experience of recommender systems. org/citation.cfm?id=1759957.1759988 User Model User-Adapt Interact 22(4–5):441–504 128. Mei T, Yang B, Hua XS, Li S (2011) Contextual video recommen- 109. Konecni VJ (1982) Social interaction and musical preference. In: dation by multimodal relevance and user feedback. ACM Trans The psychology of music, pp 497–516 Inf Syst 29(2):10 110. Koole SL (2009) The psychology of emotion regulation: an inte- 129. North A, Hargreaves D (1996) Situational influences on reported grative review. Cogn Emot 23:4–41 musical preference. Psychomusicol Music Mind Brain 15(1– 111. Kosinski M, Stillwell D, Graepel T (2013) Private traits and 2):30–45 attributes are predictable from digital records of human behav- 130. North A, Hargreaves D (2008) The social and applied psychology ior. Proc Natl Acad Sci 110(15):5802–5805 of music. Oxford University Press, Oxford 112. Kuo FF, Chiang MF, Shan MK, Lee SY (2005) Emotion-based 131. North AC, Hargreaves DJ (1996) Situational influences on music recommendation by association discovery from film music. reported musical preference. Psychomusicology A J Res Music In: Proceedings of the 13th annual ACM international conference Cogn 15(1–2):30 on multimedia. ACM, pp 507–510 132. Novello A, McKinney MF, Kohlrausch A (2006) Perceptual Eval- 113. Laplante A (2014) Improving music recommender systems: What uation of Music Similarity. In: Proceedings of the 7th international we can learn from research on music tastes? In: 15th International conference on music information retrieval (ISMIR), Victoria, BC, society for music information retrieval conference, Taipei, Taiwan Canada 114. Laplante A, Downie JS (2006) Everyday life music information- 133. O’Brien HL, Toms EG (2010) The development and evaluation of seeking behaviour of young adults. In: Proceedings of the 7th a survey to measure user engagement. J Am Soc Inf Sci Technol international conference on music information retrieval, Victoria 61(1):50–69. https://doi.org/10.1002/asi.v61:1 (BC), Canada 134. O’Hara K, Brown B (eds) (2006) Consuming music together: 115. Lee JH (2011) How similar is too similar? Exploring users’ per- social and collaborative aspects of music consumption technolo- ceptions of similarity in playlist evaluation. In: Proceedings of the gies, computer supported cooperative work, vol 35. Springer, 12th international society for music information retrieval confer- Dordrecht ence (ISMIR 2011), Miami, FL, USA 135. Pachet F, Roy P, Cazaly D (1999) A combinatorial approach to 116. Lee JH, Cho H, Kim YS (2016) Users’ music information needs content-based music selection. In: IEEE international conference and behaviors: design implications for music information retrieval on multimedia computing and systems, 1999, vol 1. IEEE, pp systems. J Assoc Inf Sci Technol 67(6):1301–1330 457–462 117. Lee JH, Wishkoski R, Aase L, Meas P, Hubbles C (2017) 136. Pagano R, Quadrana M, Elahi M, Cremonesi P (2017) Toward Understanding users of cloud music services: selection factors, active learning in cross-domain recommender systems. CoRR. management and access behavior, and perceptions. J Assoc Inf arXiv:1701.02021 Sci Technol 68(5):1186–1200 137. Pan R, Zhou Y, Cao B, Liu NN, Lukose R, Scholz M, Yang Q 118. Lehmann J, Lalmas M, Yom-Tov E, Dupret G (2012) Models (2008) One-class collaborative filtering. In: Proceedings of the of user engagement. In: Proceedings of the 20th international 8th IEEE international conference on data mining. IEEE, pp 502– conference on user modeling, adaptation, and personalization, 511 UMAP’12. Springer, Berlin, Heidelberg, pp 164–175. https://doi. 138. Panteli M, Benetos E, Dixon S (2016) Learning a feature space for org/10.1007/978-3-642-31454-4_14 similarity in world music. In: Proceedings of the 17th international 119. Li Q, Myaeng SH, Guan DH, Kim BM (2005) A probabilistic society for music information retrieval conference (ISMIR 2016), model for music recommendation considering audio features. In: New York, NY, USA Asia information retrieval symposium. Springer, pp 72–83 139. Park ST, Chu W (2009) Pairwise preference regression for cold- 120. Liu NN, Yang Q (2008) Eigenrank: a ranking-oriented approach start recommendation. In: Proceedings of the third ACM confer- to collaborative filtering. In: SIGIR’08: proceedings of the 31st ence on recommender systems, RecSys’09. ACM, New York, NY, annual international ACM SIGIR conference on research and USA, pp 21–28. https://doi.org/10.1145/1639714.1639720 development in information retrieval. ACM, New York, NY, USA, 140. Pettijohn T, Williams G, Carter T (2010) Music for the sea- pp 83–90. https://doi.org/10.1145/1390334.1390351 sons: seasonal music preferences in college students. Curr Psychol 29(4):328–345 123 International Journal of Multimedia Information Retrieval (2018) 7:95–116 115 141. Pichl M, Zangerle E, Specht G (2015) Towards a context-aware tra music. IEEE Trans Affect Comput. https://doi.org/10.1109/ music recommendation approach: what is hidden in the playlist TAFFC.2017.2663421 name? In: 2015 IEEE international conference on data mining 160. Schedl M, Hauger D, Schnitzer D (2012) A model for serendip- workshop (ICDMW). IEEE, pp 1360–1365 itous music retrieval. In: Proceedings of the 2nd workshop on 142. Pohle T, Knees P, Schedl M, Pampalk E, Widmer G (2007) “Rein- context-awareness in retrieval and recommendation (CaRR), Lis- venting the Wheel”: a novel approach to music player interfaces. bon, Portugal IEEE Trans Multimed 9:567–575 161. Schedl M, Knees P, Gouyon F (2017) New paths in music rec- 143. Pu P, Chen L, Hu R (2012) Evaluating recommender systems ommender systems research. In: Proceedings of the 11th ACM from the user’s perspective: survey of the state of the art. User conference on recommender systems (RecSys 2017), Como, Italy Model User-Adapt Interact 22(4–5):317–355. https://doi.org/10. 162. Schedl M, Knees P, McFee B, Bogdanov D, Kaminskas M (2015) 1007/s11257-011-9115-7 Music recommender systems. In: Ricci F, Rokach L, Shapira B, 144. Punkanen M, Eerola T, Erkkilä J (2011) Biased emotional recog- Kantor PB (eds) Recommender systems handbook, chap. 13, 2nd nition in depression: perception of emotions in music by depressed edn. Springer, Berlin, pp 453–492 patients. J Affect Disord 130(1–2):118–126 163. Schedl M, Melenhorst M, Liem CC, Martorell A, Mayor O, 145. Quadrana M, Cremonesi P, Jannach D (2018) Sequence-aware Tkalci ˇ cˇ M (2016) A personality-based adaptive system for visu- recommender systems. arXiv preprint arXiv:1802.08452 alizing classical music performances. In: Proceedings of the 146. Rashid AM, Karypis G, Riedl J (2008) Learning preferences 7th ACM multimedia systems conference (MMSys), Klagenfurt, of new users in recommender systems: an information theoretic Austria approach. SIGKDD Explor Newsl 10:90–100. https://doi.org/10. 164. Schein AI, Popescul A, Ungar LH, Pennock DM (2002) Meth- 1145/1540276.1540302 ods and metrics for cold-start recommendations. In: SIGIR’02: 147. Rentfrow PJ, Gosling SD (2003) The do re mi’s of everyday life: Proceedings of the 25th annual international ACM SIGIR con- the structure and personality correlates of music preferences. J ference on research and development in information retrieval. Personal Soc Psychol 84(6):1236–1256 ACM, New York, NY, USA, pp 253–260. https://doi.org/10.1145/ 148. Repetto RC, Serra X (2014) Creating a corpus of Jingju (Bei- 564376.564421 jing opera) music and possibilities for melodic analysis. In: 15th 165. Serra X (2014) Computational approaches to the art music tradi- International society for music information retrieval conference, tions of India and Turkey. J New Music Res 43(1):1–2. https:// Taipei, Taiwan, pp 313–318 doi.org/10.1080/09298215.2014.894083 149. Reynolds G, Barry D, Burke T, Coyle E (2007) Towards a per- 166. Serra X (2014) Creating research corpora for the computational sonal automatic music playlist generation algorithm: the need for study of music: the case of the compmusic project. In: AES 53rd contextual information. In: Proceedings of the 2nd international international conference on semantic audio. AES, AES, London, audio mostly conference: interaction with sound, Ilmenau, Ger- UK, pp 1–9 many, pp 84–89 167. Seyerlehner K, Schedl M, Pohle T, Knees P (2010) Using 150. Ribeiro MT, Lacerda A, Veloso A, Ziviani N (2012) Pareto- block-level features for genre classification, tag classification and efficient hybridization for multi-objective recommender systems. music similarity estimation. In: Extended abstract to the music In: Proceedings of the Sixth ACM conference on recommender information retrieval evaluation eXchange (MIREX 2010)/11th systems, RecSys’12. ACM, New York, NY, USA, pp 19–26. international society for music information retrieval conference https://doi.org/10.1145/2365952.2365962 (ISMIR 2010), Utrecht, the Netherlands 151. Rubens N, Elahi M, Sugiyama M, Kaplan D (2015) Active 168. Seyerlehner K, Widmer G, Schedl M, Knees P (2010) Automatic learning in recommender systems. In: Recommender systems music tag classification based on block-level features. In: Proceed- handbook—chapter 24: recommending active learning. Springer ings of the 7th sound and music computing conference (SMC), US, pp 809–846 Barcelona, Spain 152. Russell JA (1980) A circumplex model of affect. J Personal Soc 169. Shao B, Wang D, Li T, Ogihara M (2009) Music recommendation Psychol 39(6):1161–1178 based on acoustic features and user access patterns. IEEE Trans 153. Schäfer T, Auerswald F, Bajorat IK, Ergemlidze N, Frille K, Audio Speech Lang Process 17(8):1602–1611 Gehrigk J, Gusakova A, Kaiser B, Pätzold RA, Sanahuja A, Sari S, 170. Skowron M, Ferwerda B, Tkalci ˇ cˇ M, Schedl M (2016) Fusing Schramm A, Walter C, Wilker T (2016) The effect of social feed- social media cues: personality prediction from Twitter and Insta- back on music preference. Musicae Sci 20(2):263–268. https:// gram. In: Proceedings of the 25th international world wide web doi.org/10.1177/1029864915622054 conference (WWW), Montreal, Canada 154. Schäfer T, Mehlhorn C (2017) Can personality traits predict musi- 171. Skowron M, Lemmerich F, Ferwerda B, Schedl M (2017) Predict- cal style preferences? A meta-analysis. Personal Individ Differ ing genre preferences from cultural and socio-economic factors 116:265–273. https://doi.org/10.1016/j.paid.2017.04.061 for music retrieval. In: Proceedings of the ECIR 155. Schäfer T, Sedlmeier P, Stdtler C, Huron D (2013) The psycho- 172. Slaney M, White W (2006) Measuring playlist diversity for rec- logical functions of music listening. Front Psychol 4(511):1–34 ommendation systems. In: Proceedings of the 1st ACM workshop 156. Schedl M (2017) Investigating country-specific music prefer- on Audio and music computing multimedia. ACM, pp 77–82 ences and music recommendation algorithms with the LFM-1b 173. Smyth B, McClave P (2001) Similarity vs. diversity. In: Proceed- dataset. Int J Multimed Inf Retr 6(1):71–84. https://doi.org/10. ings of the 4th international conference on case-based reasoning: 1007/s13735-017-0118-y case-based reasoning research and development, ICCBR’01. 157. Schedl M, Breitschopf G, Ionescu B (2014) Mobile music genius: Springer, London, UK, pp 347–361. http://dl.acm.org/citation. reggae at the beach, metal on a Friday night? In: Proceedings of cfm?id=646268.758890 the 4th ACM international conference on multimedia retrieval 174. Sordo M, Chaachoo A, Serra X (2014) Creating corpora for com- (ICMR), Glasgow, UK putational research in arab-andalusian music. In: 1st International 158. Schedl M, Flexer A, Urbano J (2013) The neglected user in music workshop on digital libraries for musicology, London, UK, pp. 1– information retrieval research. J Intell Inf Syst 41:523–539 3. https://doi.org/10.1145/2660168.2660182 159. Schedl M, Gómez E, Trent ES, Tkalci ˇ cˇ M, Eghbal-Zadeh H, 175. Swearingen K, Sinha R (2001) Beyond algorithms: an hci perspec- Martorell A (2017) On the Interrelation between listener char- tive on recommender systems. In: ACM SIGIR 2001 workshop acteristics and the perception of emotions in classical orches- on recommender systems, vol 13, pp 1–11 123 116 International Journal of Multimedia Information Retrieval (2018) 7:95–116 176. Tamir M (2011) The maturing field of emotion regulation. Emot 186. Yang YH, Chen HH (2011) Music emotion recognition. CRC Rev 3:3–7 Press, Boca Raton 177. Tintarev N, Lofi C, Liem CC (2017) Sequences of diverse song 187. Yang YH, Chen HH (2012) Machine recognition of music emo- recommendations: an exploratory study in a commercial system. tion: a review. ACM Trans Intell Syst Technol 3(4):40 In: Proceedings of the 25th conference on user modeling, adapta- 188. Yang YH, Chen HH (2013) Machine recognition of music emo- tion and personalization, UMAP’17. ACM, New York, NY, USA, tion: a review. Trans Intell Syst Technol 3(3):40:1–40:30 pp 391–392. https://doi.org/10.1145/3079628.3079633 189. Yoshii K, Goto M, Komatani K, Ogata T, Okuno HG (2006) 178. Tkalcic M, Kosir A, Tasic J (2013) The ldos-peraff-1 corpus of Hybrid collaborative and content-based music recommendation facial-expression video clips with affective, personality and user- using probabilistic model with latent user preferences. In: ISMIR, interaction metadata. J Multimodal User Interfaces 7(1–2):143– vol 6, p 7th 155. https://doi.org/10.1007/s12193-012-0107-7 190. Zamani H, Bendersky M, Wang X, Zhang M (2017) Situational 179. Tkalci ˇ cˇ M, Quercia D, Graf S (2016) Preface to the special context for ranking in personal search. In: Proceedings of the 26th issue on personality in personalized systems. User Model User- international conference on world wide web, WWW’17. Interna- Adapt Interact 26(2):103–107. https://doi.org/10.1007/s11257- tional world wide web conferences steering committee, Republic 016-9175-9 and Canton of Geneva, Switzerland, pp 1531–1540. https://doi. 180. Uitdenbogerd A, Schyndel R (2002) A review of factors affecting org/10.1145/3038912.3052648 music recommender success. In: 3rd International conference on 191. Zentner M, Grandjean D, Scherer KR (2008) Emotions evoked music information retrieval, ISMIR 2002. IRCAM-Centre Pom- by the sound of music: characterization, classification, and mea- pidou, pp 204–208 surement. Emotion 8(4):494 181. Vall A, Quadrana M, Schedl M, Widmer G, Cremonesi P (2017) 192. Zhang Z, Jin X, Li L, Ding G, Yang Q (2016) Multi-domain active The importance of song context in music playlists. In: Proceedings learning for recommendation. In: AAAI, pp 2358–2364 of the poster track of the 11th ACM conference on recommender 193. Zhang YC, O Seaghdha D, Quercia D, Jambor T (2012) Auralist: systems (RecSys), Como, Italy introducing serendipity into music recommendation. In: Proceed- 182. Vargas S, Baltrunas L, Karatzoglou A, Castells P (2014) Coverage, ings of the 5th ACM international conference on web search and redundancy and size-awareness in genre diversity for recom- data mining (WSDM), Seattle, WA, USA mender systems. In: Proceedings of the 8th ACM conference on 194. Zheleva E, Guiver J, Mendes Rodrigues E, Milic-Frayling ´ N recommender systems, RecSys’14. ACM, New York, NY, USA, (2010) Statistical models of music-listening sessions in social pp 209–216. https://doi.org/10.1145/2645710.2645743 media. In: Proceedings of the 19th international conference on 183. Vargas S, Castells P (2011) Rank and relevance in novelty and world wide web (WWW), Raleigh, NC, USA, pp 1019–1028 diversity metrics for recommender systems. In: Proceedings of 195. Zhou T, Kuscsik Z, Liu JG, Medo M, Wakeling JR, Zhang YC the 5th ACM conference on recommender systems (RecSys), (2010) Solving the apparent diversity-accuracy dilemma of rec- Chicago, IL, USA ommender systems. Proc Natl Acad Sci 107(10):4511–4515 184. Wang X, Rosenblum D, Wang Y (2012) Context-aware mobile 196. Ziegler CN, McNee SM, Konstan JA, Lausen G (2005) Improving music recommendation for daily activities. In: Proceedings of the recommendation lists through topic diversification. In: Proceed- 20th ACM international conference on multimedia. ACM, Nara, ings of the 14th international conference on the world wide web. Japan, pp 99–108 ACM, pp 22–32 185. Weimer M, Karatzoglou A, Smola A (2008) Adaptive collabo- rative filtering. In: RecSys’08: proceedings of the 2008 ACM conference on recommender systems. ACM, New York, NY, USA, pp. 275–282. https://doi.org/10.1145/1454008.1454050

Journal

International Journal of Multimedia Information RetrievalSpringer Journals

Published: Apr 5, 2018

References