CEO, Head of Data Science, Insight AI, Russia, Moscow
BUSINESS-CONSTRAINED PROBABILISTIC RECOMMENDER SYSTEMS: JOINT OPTIMIZATION OF ENGAGEMENT AND OPERATIONAL CONSTRAINTS IN HIGH-LOAD SERVICES
ABSTRACT
Recommender systems are a core component of high-load digital products, including media platforms, social networks, e-commerce catalogues, delivery applications and quick-service restaurant (QSR) apps. In practice, such systems are typically optimized for engagement and short-term value metrics (consumption probability, watch time, basket value), while operational constraints – production capacity, delivery SLAs, promotion budgets, inventory and catalog health – are handled by ad-hoc business rules. This paper introduces a framework for business-constrained probabilistic recommender systems in which each content item (product, menu offer, track, article) is described by a vector of probabilistic outcomes (engagement, value, resource consumption), and recommendation is formulated as a constrained optimization problem. We argue that the proposed formulation enables joint control of user engagement and operational constraints in high-load services and provides a principled foundation for building production-grade, business-aligned recommender systems in enterprise environments.
АННОТАЦИЯ
Рекомендательные системы стали ключевым компонентом высоконагруженных цифровых сервисов: медиаплатформ, социальных сетей, e-commerce, приложений доставки и сетей быстрого питания. На практике они часто оптимизируются по метрикам вовлечённости и краткосрочной ценности (вероятность потребления, время просмотра, чек), тогда как операционные ограничения – производственные мощности, SLA доставки, промо бюджеты, инвентарь, здоровье каталога контента обрабатываются набором эвристик и правил. В работе предлагается фреймворк бизнес-ограниченных вероятностных рекомендательных систем, в котором каждая единица контента (товар, блюдо, трек, статья) описывается вектором вероятностных исходов (вовлечённость, ценность, потребление ресурсов), а задача рекомендации формулируется как оптимизация с ограничениями. Показано, что предложенный подход позволяет совместно контролировать вовлечённость и операционные ограничения в высоконагруженных сервисах и служит основой для построения промышленного, ориентированного на бизнес поведения рекомендателя.
Keywords: recommender systems, probabilistic modelling, business constraints, operational load, high-load services, enterprise AI.
Ключевые слова: рекомендательные системы, вероятностное моделирование, бизнес-ограничения, операционная нагрузка, высоконагруженные сервисы, enterprise AI.
1. Introduction
Recommender systems have become a central component of high-load digital products: media streaming platforms, news feeds, social networks, e-commerce catalogues, food delivery and quick-service restaurant (QSR) applications. In all of these settings, the system repeatedly selects a small subset of content items – products, menu offers, tracks, videos, articles – from a large catalogue to present to a user in a given context. Industrial and academic work largely focuses on improving user engagement and value-oriented metrics such as consumption probability, watch time, session length, conversion rate or expected basket value, using increasingly sophisticated models and features.
In real services, however, recommendation does not operate in a vacuum. Each recommended item simultaneously consumes limited business resources and interacts with operational and policy constraints. In digital content platforms, recommendation decisions are coupled with promotion budgets, exposure quotas for creators, contractual obligations and catalogue-health requirements (for example, avoiding permanent over-exposure of a small set of hits). In transactional environments such as QSR or online retail, recommended offers affect kitchen or warehouse capacity, preparation time, delivery service-level agreements (SLAs) and inventory levels. A recommender that naively maximizes short-term engagement or conversion can therefore produce failure modes at the business level: overloaded kitchens and broken SLAs, exhausted stock of promoted items, skewed catalogue exposure or degraded long-term user experience, even when offline recommender metrics look strong.
Traditional recommender system research often treats these effects as exogenous or secondary. Mainstream ranking models optimize a single engagement-oriented score, sometimes followed by post-hoc re-ranking for diversity or simple hard business rules. Work on multi-objective recommendation and fairness has introduced additional criteria such as diversity, coverage and creator exposure, and constrained bandit formulations have been applied to certain resource-allocation problems. Yet operational constraints in large-scale systems are still frequently handled by opaque rule layers sitting on top of black-box recommenders. Such layers are hard to reason about, tune or evaluate systematically, especially under uncertainty and load variability.
From the perspective of an enterprise operating a high-load service, recommendation is inherently a problem of allocating limited resources under uncertainty. Each candidate content item has a stochastic impact on user engagement, monetary value and consumption of business resources, which may represent promotion slots, catalogue exposure, production capacity, delivery slots, inventory and other domain-specific limits. Decisions must trade off expected engagement and value against the risk of violating these constraints, but current recommender formulations rarely make this trade-off explicit.
In this paper we introduce the notion of a business-constrained probabilistic recommender system. Instead of predicting a single engagement score per user–item–context tuple, we model a vector of probabilistic outcomes that includes user engagement (e.g., consumption probability, time spent), expected monetary value and one or more dimensions of resource consumption defined by the business. On top of these predictions, we formulate the recommendation task as a constrained optimization problem: for each request, select a list of content items that maximizes expected utility while satisfying constraints on resource usage and acceptable risk of constraint violation.
Our contribution is to provide a unified, domain-agnostic formulation that applies to any type of content – media objects, articles, products, menu offers – through an abstract load function defined over items and context, and to outline practical decision layers that implement this formulation in production. This business-constrained, probabilistic view of recommendation aligns recommender behavior with the operational envelope of the service and can serve as a foundation for enterprise-grade recommender systems in resource-limited and mission-critical environments.
Materials and Methods
2.1 Unified content and load model
We model recommendation as repeated selection of a ranked list
for a user
under context
, from a catalogue
. The context
aggregates request-level signals (time, device, location, session state, channel, and operational state such as peak/off-peak). Each item
represents a generic unit of content: media objects, articles, images, products, menu offers, or promotional bundles.
2.1.1 Outcomes as random variables
A key property of high-load systems is uncertainty: user responses and business side-effects vary even for identical
. We therefore treat outcomes as random variables. For each
we define a vector-valued outcome
/Evdokimov.files/image008.png)
where:
-
denotes an engagement outcome (e.g., play probability, completion rate, watch/listen time);
-
denotes a business value outcome (e.g., expected margin, expected basket increment, a revenue proxy);
- and
denotes a vector of resource-related outcomes (e.g., a delivery-delay indicator, preparation time, promotion-slot consumption, exposure skew).
This representation supports both “digital content” and “transactional content” with a shared mathematical structure: each item affects engagement/value and consumes limited resources.
2.1.2 A domain-agnostic resource (load) function
We introduce a domain-agnostic load function
,
whose components represent the expected consumption of business resources if item
is served to user
in context
. The vector dimension
is chosen by the business to reflect its operational envelope.
Examples:
1. Digital content platforms: promotion-slot usage, exposure quotas for creators/labels, catalog-health budgets (limits on repeated exposure), moderation/review budget.
2. Retail/QSR: inventory units, kitchen capacity units or expected prep time, delivery-slot usage, warehouse picking/packing load.
We define the list-level load as an additive functional
/Evdokimov.files/image017.png)
where
captures position- or interface-dependent scaling (e.g., top slots have a higher probability of being chosen and thus induce higher expected load). Additivity is a standard and practical approximation in ranking systems: it holds exactly when resource-usage contributions are independent and additive, and it remains useful as a first-order model when interactions exist.
2.1.3. Capacity budgets and constraint sets
Let
be a context-dependent capacity vector describing allowable resource consumption within an operational window. This captures the fact that constraints vary with time and system state (peak vs off-peak, staffing level, delivery fleet availability, promotion campaign budgets). We define the feasible set:
/Evdokimov.files/image020.png)
In addition to expected-budget constraints, many real systems must control tail risk (e.g., avoid SLA breaches or overload events). For such cases we introduce probabilistic constraints (“chance constraints”) in Section 4.
2.2 Probabilistic modelling and business-constrained objective
2.2.1 From business goals to expected utility maximization
In production, the recommender’s goal is not to “maximize a score”, but to maximize expected business utility under constraints. We formalize this via a utility function
that maps outcomes to value:
/Evdokimov.files/image022.png)
where
is monotone in engagement and value (the exact form is product-specific; a common choice is a weighted combination). The expected item utility is:
/Evdokimov.files/image023.png)
The list utility is aggregated with rank discounting:
/Evdokimov.files/image024.png)
where
captures position bias (e.g., logarithmic discount). This is consistent with common offline ranking objectives, but makes explicit that we optimize expected utility rather than a proxy metric.
2.2.2 Why probabilistic outputs are needed (risk-aware decisions)
Expected utility alone is insufficient when costs are asymmetric or when violations are unacceptable. For example, in QSR, occasional overload can break SLAs; in media, exposure concentration can degrade catalog health or violate promotion contracts. These are inherently tail phenomena. We therefore consider constraint functions based on stochastic resource outcomes. Let
be a scalar risk/violation indicator computed from list-level outcomes (e.g., an SLA breach indicator). We impose:
/Evdokimov.files/image027.png)
where
is the acceptable violation rate. In practice, this requires probabilistic modelling (either direct probability prediction for violation events or distributional estimates that enable quantile-based approximations). This motivates the probabilistic layer: predicting not only means but also uncertainty or quantiles.
2.2.3 Business-constrained probabilistic recommendation problem
The overall per-request decision problem is:
/Evdokimov.files/image029.png)
This problem explicitly encodes the enterprise requirement: keep recommendation within the operational envelope.
2.2.4 Lagrangian relaxation and an implementable scoring rule
A common approach to constrained maximization in ranking systems is Lagrangian relaxation. Consider expected-load constraints only (for clarity). Define multipliers
and the Lagrangian:
/Evdokimov.files/image031.png)
Maximizing
over
yields a penalized objective in which each item receives a business-aware adjusted score:
/Evdokimov.files/image034.png)
Then a practical decision rule is to rank candidates by
and pick top-
, with
tuned to satisfy the constraints. This provides a direct path from the formal objective to a production implementable re-ranking layer.
Proposition 1 (Interpretation).
If list-level utility and load are additive over items with fixed rank weights, maximizing the Lagrangian over lists is equivalent to sorting items by the adjusted score
and selecting the top-
.
Sketch of proof. Under additivity,
/Evdokimov.files/image038.png)
The last term does not depend on
, so maximizing
reduces to selecting items with maximal per-position contributions, which is achieved by ranking candidates by their adjusted score (with rank weights absorbed into the score).
2.2.5 Incorporating probabilistic risk constraints
Chance constraints can be handled similarly by converting risk into a penalty. Let
be a predicted risk measure (e.g., probability of SLA breach contribution, or an upper quantile of preparation time). Then we extend the adjusted score:
/Evdokimov.files/image040.png)
where
controls risk aversion. In media-type domains,
may represent exposure concentration risk or contract-violation probability; in transactional domains,
may represent SLA violation probability. This shows how the same unified formulation applies across content types and constraint families.
2.3 Decision layer: constrained re-ranking and contextual policies
The formulation in Section 4 yields a clear separation between (i) outcome prediction and (ii) decision-making. In production systems this separation is practical: a strong baseline recommender produces candidates and relevance signals, while a decision layer enforces business constraints and risk preferences. We describe two implementable decision layers: (a) constrained re-ranking over a candidate set and (b) contextual policy selection (bandit-style) over a small family of ranking policies.
2.3.1. Constrained re-ranking over candidate sets
Let
be a candidate set produced by a baseline retrieval/ranking system (e.g., a two-stage recommender: retrieval → ranker). For each candidate item
, we assume the system provides an expected utility
, an expected resource vector
, and an optional risk measure
(e.g., predicted probability of violation or a high quantile of resource usage).
A constrained list selection problem can be written as:
/Evdokimov.files/image047.png)
(optionally with additional risk constraints). This is a multi-dimensional knapsack-like problem when
, and exact optimization may be expensive for online serving. We therefore focus on two practical solutions widely used in industrial ranking.
(A) Lagrangian re-ranking (penalty-based).
We introduce multipliers
and compute an adjusted score:
/Evdokimov.files/image049.png)
We then select the top-
items by
. The multipliers
and
act as shadow prices for resources and risk. They can be tuned globally, per segment (e.g., peak vs off-peak), or dynamically from real-time capacity signals.
(B) Greedy constrained selection (feasible-first).
Alternatively, we can explicitly enforce feasibility by iteratively constructing the list. First, candidates are sorted by a base score (e.g.,
or
). Next, the algorithm adds the next-best item only if constraints remain satisfied. Finally, if needed, remaining slots are filled with low-load items. This approach is robust and easy to implement, and is often sufficient when constraints are hard (e.g., must-not-show items) or when capacity is tight.
Operational interpretation.
In digital content platforms,
may include exposure budgets or catalog-health penalties; in transactional systems, it may include inventory units and expected preparation time. In both cases, the re-ranking layer converts a general recommender into a business-aligned system by making constraints explicit and optimizable.
2.3.2. Lemma: penalized re-ranking is a lower bound to constrained optimum
The penalty-based approach is not merely heuristic; it has a clear relationship to the constrained problem.
Lemma 1 (Lagrangian lower bound).
Let OPT be the optimal value of the constrained selection problem:
/Evdokimov.files/image054.png)
For any
, the maximum of the penalized objective
/Evdokimov.files/image055.png)
yields a Lagrangian relaxation that provides a lower bound on the constrained optimum after subtracting the constant term
.
Proof (sketch). For any feasible
,
, hence
/Evdokimov.files/image058.png)
Taking the maximum over feasible
on the left and over all
on the right yields the claim.
2.3.3. Lemma: adjusted score is an exchange rate between value and resources
Lemma 2 (Shadow price interpretation).
Assume the relaxed instance admits an approximately well-defined dual interpretation. Then each component
can be interpreted as the marginal utility gain per additional unit of resource
, i.e., an exchange rate between expected utility and resource consumption. In convex settings,
equals the partial derivative of the optimal value with respect to capacity
. In our discrete ranking setting, we use this as an approximation: increasing
relaxes the constraint and can only improve the optimum; the multiplier that balances this improvement corresponds to the marginal value of capacity. This makes tuning explainable: stakeholders can reason about whether one unit of operational load is worth a given increment of expected utility during peak hours and adjust
accordingly
2.3.4. Probabilistic risk control via quantiles
A key benefit of probabilistic modelling is the ability to control tail risks using quantiles. Let
denote the
-quantile of a resource outcome (e.g., preparation time or a delay-risk proxy). A risk-aware load can be defined as:
/Evdokimov.files/image065.png)
where
controls risk aversion. Penalizing
is a practical surrogate to chance constraints: it prefers items with lower upper-tail resource usage even when means are similar.
2.3.5. Contextual policy selection (bandit-style decision layer)
In some settings, the optimal trade-off between utility and constraints varies rapidly across contexts (peak/off-peak, region, campaign state, kitchen load, inventory availability). Instead of re-optimizing
continuously, we define a small set of policies
, where each policy corresponds to a particular trade-off configuration (e.g., different
, different
, different candidate filters). At each request
, a policy is selected,
, and then the policy produces the list
(e.g., via constrained re-ranking). Policy selection can be learned online with a contextual bandit objective, where reward is realized engagement/value (e.g., watch time, basket value) and constraints are observed violations (SLA breaches, overload events, exposure budget exceedance).
This yields a practical learning loop: the system adapts which policy to use for which context while maintaining constraint satisfaction.
2.3.6. Implementation summary
In production, the overall flow is as follows. A baseline recommender generates candidates and relevance features. A probabilistic outcome model predicts utility, value, and resource/risk outcomes. A decision layer applies either constrained re-ranking by adjusted scores or contextual policy selection over a small policy set. Monitoring then checks realized constraints and calibrates
,
, and quantile-risk parameters. This decision layer operationalizes the unified formulation: it works for any content type, because the only domain-specific component is the definition (and estimation) of
and the violation or risk functions.
2.6. Datasets and business scenarios
To demonstrate the domain-agnostic nature of the proposed formulation, we consider two families of high-load “content recommendation” scenarios that differ by the type of resources consumed, while sharing the same decision structure.
Scenario A: Digital content platform (media / feed / content)
Items represent digital content units (e.g., tracks, videos, articles, posts). The catalog is large, user feedback is implicit and sequential, and position bias is substantial. In addition to engagement and short-term value, the platform must respect non-trivial operational and policy constraints, including:
- promotion budgets (limited number of high-impact placements per time window),
- exposure quotas (creator/provider constraints, contractual obligations),
- catalog health constraints (avoid concentration and repeated over-exposure, control fatigue and popularity skew).
In this scenario,
includes exposure and promotion resource dimensions, and risk terms can capture undesirable states such as exposure concentration or policy violations.
Scenario B: Transactional content (retail / QSR / delivery)
Items represent transactional content units (products, menu offers, bundles). The system optimizes for incremental value while the business operates inside hard physical constraints:
- capacity (kitchen/fulfillment load, prep time, picking/packing),
- inventory (availability and depletion risk),
- service-level constraints (delivery SLAs, peak-hour overload).
Here,
includes operational resource dimensions (time units, capacity units, inventory units), and risk terms capture overload and SLA breach probabilities.
In both scenarios, the same constrained optimization view applies: recommendation is a resource allocation decision under uncertainty, with a shared decision layer and domain-specific definitions of resources and violations.
2.7. Evaluation protocol
We evaluate the system along two axes: user/product utility and constraint compliance and risk.
Offline evaluation (counterfactual / logged-data compatible)
We report standard ranking proxies (e.g., NDCG@K, Recall@K) alongside outcome-driven metrics aligned to the utility function, including engagement-oriented metrics, value-oriented proxies (e.g., expected margin or basket uplift model output), and constraint-oriented metrics such as expected resource usage
and predicted violation probability
.
Scenario-based evaluation under load variation
Because constraints are most relevant under stress (peak hours, tight budgets), we additionally evaluate policies in stress-test regimes by varying
and measuring the violation rate and its sensitivity to tighter budgets, the utility–constraint trade-off curve when sweeping
and
, and robustness to regime shifts in resource availability (e.g., sudden inventory drop or capacity decrease).
Online alignment
Where online testing is available, constrained policies can be compared as context-dependent “arms” (policy selection), consistent with constrained contextual bandit interpretations used in the literature.
Conclusion
This paper introduced a business-constrained probabilistic recommender system formulation for high-load services, motivated by the practical observation that recommendation decisions simultaneously affect user outcomes and consume limited business resources. We proposed a unified content-and-load abstraction in which any recommendable unit - media objects, articles, products, or menu offers is associated with probabilistic predictions for engagement and value, alongside a vector of resource consumption signals.
On top of this probabilistic layer, we framed recommendation as a constrained optimization problem and derived an implementable decision layer via Lagrangian relaxation, quantile-based risk control, and contextual policy selection. The resulting system makes trade-offs explicit and tunable: rather than relying on opaque rule stacks, constraints become first-class objects that can be monitored, priced, and governed.
A key outcome of this approach is domain generality. Digital content platforms primarily manage promotion and exposure budgets as well as catalog health constraints, while transactional domains (retail/QSR) manage capacity, inventory, and SLA risks. Yet the same mathematical structure – expected utility maximization subject to resource and risk constraints – applies across these settings, enabling consistent engineering, evaluation, and decision governance.
Future work includes (i) richer interaction models for non-additive loads (e.g., bundle or menu interactions), (ii) tighter links between probabilistic calibration and risk guarantees, and (iii) fully online constrained learning with safety guarantees in rapidly changing operational environments.
References:
- ACM article: Estimating and Evaluating the Uncertainty of Rating Predictions in Recommender Systems // ACM Transactions (uncertainty in RS).
- Agrawal S., Devanur N. R., Li L. An Efficient Algorithm for Contextual Bandits with Knapsacks // COLT (PMLR). 2016. P. 4–18.
- Covington P., Adams J., Sargin E. Deep Neural Networks for YouTube Recommendations // ACM RecSys. 2016.
- Gómez E. et al. Enhancing Recommender Systems with Provider Fairness // Information Processing & Management. 2025.
- Harsha P. et al. Contextual Bandits for Large-Scale Structured Discrete Constrained Optimization Problems. 2025.
- He X., Liao L., Zhang H., Nie L., Hu X., Chua T.-S. Neural Collaborative Filtering // WWW. 2017.
- Immorlica N., Sankararaman K., Schapira M., Slivkins A., Vaughan J. W. Adversarial Bandits with Knapsacks // Journal of the ACM. 2022.
- Koren Y., Bell R., Volinsky C. Matrix Factorization Techniques for Recommender Systems // Computer. 2009. Vol. 42(8). P. 30–37. DOI: 10.1109/MC.2009.263.
- Pacchiano A., Ghavamzadeh M., Bartlett P. Contextual Bandits with Stage-wise Constraints // JMLR. 2025.
- Rendle S., Freudenthaler C., Gantner Z., Schmidt-Thieme L. BPR: Bayesian Personalized Ranking from Implicit Feedback // UAI. 2009.
- Slivkins A. et al. Contextual Bandits with Packing and Covering Constraints // JMLR. 2024.
- Sluijterman L. et al. How to Evaluate Uncertainty Estimates in Machine Learning // Neural Networks. 2024.
- Vassøy B. et al. Consumer-side Fairness in Recommender Systems // Artificial Intelligence Review. 2024.
- Zhao Y. et al. Constrained Contextual Bandit Algorithm for Limited-Budget Recommendation System // Engineering Applications of Artificial Intelligence. 2024.
- Zhao Y. et al. Fairness and Diversity in Recommender Systems: A Survey // ACM Computing Surveys. 2025.