Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Constrained Dual-Level Bandit for Personalized Impression Regulation in Online Ranking Systems

Constrained Dual-Level Bandit for Personalized Impression Regulation in Online Ranking Systems Impression regulation plays an important role in various online ranking systems, e.g., e-commerce ranking systems always need to achieve local commercial demands on some pre-labeled target items like fresh item cultivation and fraudulent item counteracting while maximizing its global revenue. However, local impression regulation may cause “butterfly effects” on the global scale, e.g., in e-commerce, the price preference fluctuation in initial conditions (overpriced or underpriced items) may create a significantly different outcome, thus affecting shopping experience and bringing economic losses to platforms. To prevent “butterfly effects”, some researchers define their regulation objectives with global constraints, by using contextual bandit at the page-level that requires all items on one page sharing the same regulation action, which fails to conduct impression regulation on individual items. To address this problem, in this article, we propose a personalized impression regulation method that can directly makes regulation decisions for each user-item pair. Specifically, we model the regulation problem as a Constrained Dual-level Bandit (CDB) problem, where the local regulation action and reward signals are at the item-level while the global effect constraint on the platform impression can be calculated at the page-level only. To handle the asynchronous signals, we first expand the page-level constraint to the item-level and then derive the policy updating as a second-order cone optimization problem. Our CDB approaches the optimal policy by iteratively solving the optimization problem. Experiments are performed on both offline and online datasets, and the results, theoretically and empirically, demonstrate CDB outperforms state-of-the-art algorithms. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Knowledge Discovery from Data (TKDD) Association for Computing Machinery

Constrained Dual-Level Bandit for Personalized Impression Regulation in Online Ranking Systems

Loading next page...
 
/lp/association-for-computing-machinery/constrained-dual-level-bandit-for-personalized-impression-regulation-VZnXR4MRe9
Publisher
Association for Computing Machinery
Copyright
Copyright © 2021 Association for Computing Machinery.
ISSN
1556-4681
eISSN
1556-472X
DOI
10.1145/3461340
Publisher site
See Article on Publisher Site

Abstract

Impression regulation plays an important role in various online ranking systems, e.g., e-commerce ranking systems always need to achieve local commercial demands on some pre-labeled target items like fresh item cultivation and fraudulent item counteracting while maximizing its global revenue. However, local impression regulation may cause “butterfly effects” on the global scale, e.g., in e-commerce, the price preference fluctuation in initial conditions (overpriced or underpriced items) may create a significantly different outcome, thus affecting shopping experience and bringing economic losses to platforms. To prevent “butterfly effects”, some researchers define their regulation objectives with global constraints, by using contextual bandit at the page-level that requires all items on one page sharing the same regulation action, which fails to conduct impression regulation on individual items. To address this problem, in this article, we propose a personalized impression regulation method that can directly makes regulation decisions for each user-item pair. Specifically, we model the regulation problem as a Constrained Dual-level Bandit (CDB) problem, where the local regulation action and reward signals are at the item-level while the global effect constraint on the platform impression can be calculated at the page-level only. To handle the asynchronous signals, we first expand the page-level constraint to the item-level and then derive the policy updating as a second-order cone optimization problem. Our CDB approaches the optimal policy by iteratively solving the optimization problem. Experiments are performed on both offline and online datasets, and the results, theoretically and empirically, demonstrate CDB outperforms state-of-the-art algorithms.

Journal

ACM Transactions on Knowledge Discovery from Data (TKDD)Association for Computing Machinery

Published: Jul 21, 2021

Keywords: Online ranking systems

References