1 Introduction
2 Preference learning
2.1 Background on machine learning
2.1.1 Supervised learning and predictive modeling
2.1.2 From predictive to prescriptive modeling
2.1.3 Unsupervised and weakly supervised learning
2.1.4 Reinforcement learning
2.2 Ranking
2.3 Learning from object preferences
2.3.1 Learning utility functions
2.3.2 Learning preference relations
2.4 Learning from label preferences
2.4.1 Learning utility functions
2.4.2 Learning preference relations
2.5 Other settings
2.6 Unsupervised preference learning
2.7 Online preference learning and preference-based RL
2.8 Preference-based reinforcement learning
2.9 Preference-based search and optimization
3 PL and MCDA: a systematic comparison
Issue | PL | MCDA |
---|---|---|
Problem focus | Accurate prescriptions | Satisfactory recommendations for the DM |
User interaction | Typically not, yet possible in active learning | Constructive, with feedback from the DM in the loop |
Learning target | Population (generalize across individuals) | Single DM or a well-identified group of DMs |
Representation of alternatives | Feature-based, but also structured, often many (generic) features | Monotone, well-engineered criteria, decision space versus criteria space |
Representation of users/DMs | Feature-based | Consistent family of criteria, tailored for the DMs, is used in place of user features |
Preference information | Global/holistic, example-based | Local or global/holistic, example-based, rich specifications |
Nature of the data | Noisy/probabilistic | Inconsistencies possibly corrected or handled explicitly |
Models and model assumptions | Possibly weak assumptions (compensated by massive data) | Stronger assumptions, axiomatic foundation |
Model interpretation, usage, and expectations | Mainly predictive, accurate prediction of DM’s behavior | Mainly constructive or normative, convincing explanations of recommendations for DMs |
Observational data availability and volume | Data sets massively available (but not always accessible), possibly very large (“big data”) | Limited, user-generated data, no benchmark data, typically small set of decision examples |
Validation, success criteria | Accuracy metrics, internal validation on data | User satisfaction (difficult to measure), robustness analysis recommended |
Computational aspects | Scalability is critical | Less critical (but short computation time is required between interactions in multiobjective optimization) |
Application domains | Broad but typically not safety-critical (e-commerce, etc.), automated decisions | Broad, possibly safety-critical, one-shot decisions |
-
Problem focus As detailed in Sect. 3, preference learning, like machine learning in general, covers a broad spectrum of different tasks and types of learning problems, including different levels of supervision (from fully unsupervised to fully supervised). Yet, it is probably fair to say that, in general, PL puts a strong emphasis on learning predictive models, typically in a supervised manner, and with the main objective of producing accurate predictions or prescriptions. The accuracy is measured in terms of an underlying (target) loss function and may depend on the concrete application at hand. In spite of this focus on predictive accuracy, it should also be mentioned that other criteria have come to the fore in the recent past, especially due to the growing interest in social aspects and trustworthiness of AI. One example is fairness of predictive models, which could mean, for instance, that predicted ranking should not be biased in favor of disfavor of certain subpopulations (Zehlike et al. 2021).The focus of MCDA is to recommend a satisfactory decision to the DM. This means that the recommendation should be consistent with the DM’s preferences represented by a model, usually built in the course of an interaction with the DM. The decision problems considered within MCDA concern the best choice, or ordinal classification, or a preference ranking of alternatives. Although the focus in MCDA is on finding a single decision rather than inducing an entire model (like in ML), formulating the problem requires a lot of preliminary work related to the definition of potential alternatives and the construction of a family of criteria for their evaluation (Roy 1996).
-
User interaction Traditionally, user interaction has not been emphasized a lot in machine learning. Instead, the focus has been on the data, which, in the extreme case, was simply supposed to be given in the form of a static, pre-collected set of data stored in a database. At best, a human user was considered as a “labeler”, i.e., to provide supervision for training instances, e.g., in the setting of active learning or in crowdsourcing (Chen et al. 2013). With an increasing number of ML applications and the use of AI systems in everyday life, this started to change more recently, and machine learning “with the human in the loop” is now gaining popularity.MCDA is a process heavily involving the DM in the co-construction of their preferences by exploring, interpreting, and arguing, with the aim of recommending a course of action to increase the consistency between the evolution of the process and the DM’s objectives and value system (Roy 2000). MCDA tries to ensure that the DM, who is in the feedback loop of the decision support procedure, understands the impact of the provided preferential information on the shape of the preference model and, consequently, on the recommended decision (Corrente et al. 2024).
-
Learning target Akin to statistical inference, the key interest in machine learning is model induction, that is, the induction of a model that generalizes well beyond the training data and allows for making accurate predictions on a population level. The model itself is “individualized” in the sense that predictions pertain to individuals of the population (and are obtained as functions of a suitable formal representation of individuals, most commonly in the form of a feature vector). Nevertheless, ML mainly aims at maximizing expected accuracy. That is, instead of targeting an individual instance, performance is averaged over the entire population.MCDA is more than just predictions based on examples. Its aim is to analyze the decision-making context by identifying the problem with its actors, potential alternatives and their consequences, and the stakes. The actors are stakeholders concerned by the decision. Their points of view are expressed through evaluation criteria. The DM is either an individual or a group that collectively makes a decision. Usually, in the decision-aiding process, there is an analyst who acts as an intermediary between the calculation procedure and the DM, organizing the interaction.
-
Representation of alternatives In preference learning, the representation of choice alternatives strongly depends on the learning task. In the label ranking problem, for example, alternatives are merely identified by a label, but not represented in terms of any properties. In other settings, such as object ranking, properties—or features in ML jargon— are used, sometimes in the form of semantically meaningful (high-level) features, but often also in the form of more generic low-level features such as pixels in an image. These low-level features are especially common in the realm of deep learning, where the construction of meaningful (higher-level) representations is considered as a part of the learning process. In addition to feature representations, more structured representations such as graphs, trees, or sequences are also common in ML/PL.Alternatives considered in MCDA are potential actions with known or probabilistic consequences. Based on these consequences, a consistent family of evaluation criteria is built to characterize the alternatives. The set of considered alternatives may be either explicitly or implicitly known. In the former case, it is presented in the form of a finite list of alternatives with their performance matrix, where each alternative is represented by a vector of performances on the evaluation criteria. In the latter case, each alternative is characterized by a vector of decision variables subject to mathematical programming constraints or subject to a combinatorial generator. The decision variables are arguments of objective functions (criteria). The latter case of MCDA is called multiobjective optimization.
-
Representation of users/DMs As already mentioned, instances in machine learning are most commonly represented in terms of features, and so are users in preference learning. Thus, a user is formally represented in terms of a vector, where each entry corresponds to the value of a certain feature. The latter can be mixed in their scaling and underlying domains, which can be numerical, binary, or (ordered) categorical. More complex, structured or multimodal representations have also become more common in the recent past—for example, in a medical context, a patient could be represented by a combination of numerical measurements, images, and textual data.In MCDA, the users are usually called decision-makers (DMs). They are the recipients of the decision-aiding service concerning a particular decision problem (best choice, ordinal classification, or preference ranking). They are not identified otherwise than through the family of criteria used to evaluate the alternatives. The construction of criteria is a pre-stage of preference modeling. Thus, instead of characterizing the DMs by some personal features, the family of evaluation criteria is tailored to a particular DM or a group of DMs. For example, when selecting the best holiday project, the family of criteria for parents with young children will be different from the one for a couple without children, even if the considered set of alternative projects is the same.
-
Preference information In PL, preference information is typically holistic in the sense of referring to choice alternatives in their entirety. For example, a user may rate an alternative as a whole, or express a preference for one alternative over another one, though without referring to specific parts of properties of these alternatives. At the same time, preferences are often contextualized, most commonly by the user, but possibly also by other context variables specifying the choice situation.In MCDA, the preference information is necessary to build a DM’s preference model inducing a preference relation in the set of alternatives being richer than the dominance relation. The type of preference information depends on the aggregation procedure (preference model) and on the preference elicitation mode. When the preference elicitation is direct, the global preference information given by the DM concerns parameters of a value function or an outranking relation. When the preference information is indirect, it is composed of holistic decision examples or past decisions, e.g., pairwise comparisons or classifications of some alternatives. In the case of multiobjective optimization, when the search of the solution space is combined with preference modeling, the preference information is local, as it concerns the current stage of the search.
-
Nature of the data In machine learning, training data is normally supposed to be produced by an underlying data-generating process, which is stochastic in nature. In other words, data is generated by an underlying though unknown (joint) probability distribution on properties of instances and outcomes. The stochastic nature of the data is important, because real data is always “noisy” in various ways, and a model fitting such data perfectly will normally not exist.In MCDA, the data used to construct the DM’s preference model is available either on request (directly or indirectly, as explained earlier) or from the observation of the DM’s past decisions. When decision examples or past decisions are inconsistent with respect to the dominance principle, they are either corrected or the rough set concept is used to handle them explicitly.
-
Models and model assumptions Model assumptions in machine learning are normally weak in the sense that the learner can choose models (hypotheses) from a rich class of complex, nonlinear functions—deep neural networks can be mentioned as the most telling example. Obviously, to prevent the learner from overfitting the training data, such models need to be regularized. Since “black-box” function approximators such as neural networks are difficult to understand and lack interpretability, other types of models are sometimes preferred, notably symbolic models like rules or decision trees. But even these model classes are highly expressive, and models may become quite large (then again losing the advantage of intelligibility). Restrictively, however, it should also be mentioned that more standard statistical methods such as logistic regression, which do make strong (linearity) assumptions and are well interpretable, are also used in machine learning.In MCDA, the preference models are, generally, of three types: a real-valued value (utility) function, a system of binary relations (outranking), or a set of logical “if..., then...” statements (decision rules). They rely on different axiomatic foundations. The first model ranges from a simple weighted sum to integrals (Choquet, Sugeno) handling interactions among criteria. The second permits handling incomparability, and the third has the greatest capacity for preference representation of all three models. Moreover, decision rules identify values that drive DM’s decisions—each rule is an intelligible scenario of a causal relationship between performances on a subset of criteria and a comprehensive judgment.
-
Model interpretation, usage, and expectations As already mentioned, models in ML/PL are mostly of predictive or prescriptive nature, i.e., they are mainly used for making predictions of outcomes in new contexts, or recommending decisions to a user in a new situation. The expectation is that a model generalizes well beyond the training data on which it has been learned, i.e., that predictions or prescriptions are accurate and incur a small loss or regret.In MCDA, the model interpretation and usage are either normative or constructive. It is normative when an ideal rationality of the DMs is assumed and the aim is to give an “objectively” best recommendation. This approach is typical for decision analysis based on expected utility theory. The “aiding” underlying the MCDA process assumes, however, that preferences of the DMs with respect to considered alternatives do not pre-exist in their minds. Thus, MCDA involves the DMs in the co-construction of their preferences. This implies that the concepts, models, and methods proposed by MCDA are seen as keys to doors giving access to elements of knowledge contributing to the acceptance of a final recommendation.
-
Observational data availability and volume In ML/PL, data is normally assumed to be readily available, typically in large volume. Indeed, many learning algorithms, such as deep neural networks, require large amounts of data to guarantee good generalization, and the most impressive successes of machine learning can be found in data-rich domains. Typically, these are domains in which (behavioral) data can be collected easily in a systematic way, e.g., in social media or e-commerce. This being said, learning from “small data” has also gained attention more recently, as there are also domains in which the availability of data is much more limited, or producing (labeled) data is costly. Benchmark data abounds in ML/PL and plays an important role, e.g., in comparing the performance of new algorithms with the state of the art.In MCDA, the observational data concerning DMs’ preferences are not as massively available as in PL. When they are expressed by DMs in a direct dialogue, they are created during the decision-aiding process, and the volume of this data is limited by the fatigue of the DMs. They can be more when the preference information serving to build a preference model comes from observation of DMs’ routine acts before the model is built. Benchmark data on which various MCDA methods are compared are rare, however, with the exception of multiobjective optimization, for which rich benchmark data are available (Deb et al. 2002; Zitzler and Laumanns 2018).
-
Validation, success criteria In ML/PL, the predictive accuracy of models learned on training data is commonly evaluated in terms of performance metrics and related loss functions. The main goal of the evaluation procedure is to get an idea of the model’s generalization performance. Yet, since the latter depends on the true but unknown data-generating process and hence cannot be computed, it is normally approximated by means of some internal evaluation procedure. To this end, only a part of the data is used for training, while the rest is used as test data. In so-called cross-validation, the data is divided into several folds, each fold is used for testing in turn, and the performances are averaged.In MCDA, user satisfaction is rather difficult to measure. The decision-aiding process is composed of preference elicitation, preference modeling, and DM’s analysis of the recommendation loops until the DM, or a group of DMs, accepts the recommendation or decides to change the problem setting. It is recommended to perform the robustness analysis (Roy 2010; Aissi and Roy 2010) which consists of checking how the recommendation changes when the preference model parameters are changed within the margins of ignorance. Experience indicates, moreover, that DMs wish to understand how their preference information influences the recommendation. To raise the DMs’ confidence in the received recommendation, MCDA methods try to implement the postulate of transparency and explainability. This is particularly important for interactive multiobjective optimization, where the answers given by DMs in the preference elicitation phase are translated into guidelines for the search algorithm in the optimization phase. Laboratory validation of multiobjective optimization methods is performed on publicly available benchmark problems. Success is measured by the closeness of the obtained set of non-dominated solutions to the known Pareto front. In the case of interactive multiobjective optimization, an artificial DM with a known utility function is used in a single-objective benchmark algorithm.
-
Computational aspects Computational aspects play an important role in ML/PL, both in terms of time and space complexity, i.e., the time needed to train a model and the storage needed to store it (which may become an issue for large neural networks, for example). Scalability and sample efficiency of algorithms are especially important in domains where models must be trained on big data sets, but also in online settings, where models are trained, not in a batch mode, but incrementally on streaming data, perhaps even under real-time constraints. Moreover, efficiency may not only be required for training but also for prediction.In MCDA, scalability is, usually, less critical because the computation of a preference model involves a much smaller volume of data. In the case of multiobjective optimization concerning complex (nonlinear or combinatorial) problems, a short computation time is required between successive sessions of interactions with the DM.
-
Application domains Preferences in general and preference learning in particular play an increasingly important role in various domains of application, ranging from computational advertising, recommender systems, electronic commerce, and computer games to adaptive user interfaces and personalized medicine. PL is also used in social media and platforms such as TikTok, YouTube, Spotify, LinkedIn, etc., again mainly for the purpose of personalization. More recently, PL has also been applied in the realm of generative AI, for example in ChatGPT. Although the scope is very broad, PL is less applied in safety-critical domains, unless predictions or recommendations can be checked and possibly corrected by a human expert.The application area of MCDM methods is very broad. This is evidenced by the large number of methods adapted to various applications. The choice of one of these many methods for a particular decision-making problem must correspond to the context of the application, the DM’s expectations for the form of the recommendation, and the type of preference information that can be obtained from the DM (Roy and Słowiński 2013). In (Cinelli et al. 2022), a new taxonomy-based system recommending MCDA methods has been described and made available online at https://mcda.cs.put.poznan.pl/. It includes a big collection of (>200) MCDA methods. It is also worth mentioning many safety-critical applications, and one-shot decision problems concerning the situations where a decision is experienced only once (Guo 2011).
4 Combining PL and MCDA
4.1 Multi-criteria preference learning
-
Like in standard ML, training data could be collected from an underlying population with possibly heterogeneous preferences. In this case, data will typically be more extensive, but also afflicted with noise and inconsistencies. Moreover, to capture the heterogeneity of the population, preferences should be modeled as functions, not only of properties of the decision alternatives but also of properties of the individuals.
-
Like in MCDA, the data could still be assumed to come from the DMs for whom the decision aiding is performed, i.e., a single DM or a well-identified group of DMs. Even then, the data may be affected by inconsistencies, but will typically be so to a much lesser extent, especially if the group of DMs can be assumed to be somewhat homogeneous in the sense of sharing the same knowledge about decision alternatives. In this case, a single preference model can be expressed as a function of the properties of decision alternatives.
-
Especially prevalent in the field of MCDA are approaches based on the Choquet integral (Grabisch and Labreuche 2005; Grabisch et al. 2008). In Sect. 2.8 of Part I, we already presented methods for estimating the parameters of this model from holistic preference information (extended from a flat structure of the set of criteria to a hierarchical structure in Section 2.10 of Part I). In general, extracting a Choquet integral (or, more precisely, the non-additive measure on which it is defined) from data is considered as a parameter identification problem and commonly formalized as a constraint optimization problem (Beliakov 2008), for example using the sum of squared errors as an objective function (Torra and Narukawa 2007; Grabisch 2003). To this end, Mori and Murofushi (1989) propose an early approach based on the use of quadratic forms, while an alternative heuristic method is introduced by Grabisch (1995). Meanwhile, the Choquet integral has been used for various problems in the realm of machine learning, including binary classification (Tehrani et al. 2012a), ordinal classification (Beliakov and James 2010; Tehrani and Hüllermeier 2013), ranking (Tehrani et al. 2012b), metric learning (Beliakov et al. 2011), multiple instance learning (Du et al. 2019), ensemble learning (Grabisch and Nicolas 1994), and transfer learning (Murray et al. 2019). More recently, Bresson et al. (2020) developed a method for learning hierarchical Choquet integrals, which is inspired by the (deep) neural network, thereby combining machine learning with hierarchical decision modeling (cf. Section 2.10 of Part I). The Choquet integral has also been used as a preference model in interactive evolutionary multiobjective optimization (Branke et al. 2016). Last but not least, more specialized learning methods have been developed for aggregation models that can be seen as special cases of Choquet, such as the OWA operator (Torra 2004; Melnikov and Hüllermeier 2019).
-
There is also some work on learning the Sugeno integral as a qualitative counterpart of the Choquet integral. An early approach is (Prade et al. 2009), where the authors propose a procedure for eliciting the capacity underlying a Sugeno integral. Anderson et al. (2010) consider Sugeno integrals where both the integrand and the measure assume fuzzy numbers as values and propose a genetic algorithm for learning the underlying measure. Focusing on regression tasks, Gagolewski et al. (2019) develop a branch-refine-and-bound-type algorithm for fitting Sugeno integrals with respect to symmetric capacities. This algorithm is used to minimize the mean absolute error on the training data. Beliakov et al. (2020) express the learning problem as a difference of convex functions, making it amenable to DC (difference of convex) optimization methods and (local) linear programming. An approach based on linear programming is also proposed by Abbaszadeh and Hüllermeier (2021). An optimization technique based on the marginal contribution representation is explored by Beliakov and Divakov (2020). Thanks to this representation, the number of variables can be reduced, and the constraints can be simplified.
-
Another important class of MCDA models—outranking instead of utility-based—is the ELECTRE family. An evolutionary algorithm has been used for learning the parameters of ELECTRE TRI-B, separating decision classes using a single boundary profile (Doumpos et al. 2009). Learning the parameters of MR-Sort, which is a simplified variant of ELECTRE TRI-B using a majority rule and boundary class profiles, has been tackled in various ways, e.g., using mixed-integer programming (Leroy et al. 2011), or linear programming combined with simulated annealing (Olteanu and Meyer 2014), or a dedicated metaheuristic (META) combining evolutionary algorithms with mathematical programming (Sobrie et al. 2013, 2019). Another non-compensatory sorting model learning was presented in Sobrie et al. (2015). Learning the parameters of ELECTRE TRI-rC, including a single characteristic profile to describe each decision class, has been accomplished by four methods based on different optimization techniques: an evolutionary algorithm, linear programming combined with a genetic approach, simulated annealing, and a dedicated heuristic (Kadziński and Szczepański 2022).
-
Value function preference models have been learned within the regularization framework for sorting problems with multiple potentially non-monotonic criteria (Liu et al. 2019). Algorithms for learning the parameters of a value-based sorting model with diverse types of marginal value functions (including linear, piecewise-linear, splined, and general monotone ones) admitting value assignment examples in which a reference alternative can be classified into multiple classes with respective credibility degrees have been presented by Liu et al. (2020). Convex quadratic programming model for learning a value-based model with potentially interacting criteria, including novel methods for classifying non-reference alternatives to enhance the method’s applicability to different data sets was presented by Liu et al. (2021). MCPL has also been realized with value function models having a distance interpretation, like in TOPSIS. For example, Aggarwal et al. (2014) propose a method for learning of TOPSIS-like decision models, in which preference is a decreasing function of distance from an “ideal alternative”, and both the ideal alternative and the distance function are learned from data.
-
Preference modeling has also been combined with rule induction as one of the most classical machine learning methods. For example, the MORE method learns rule ensembles subject to monotonicity constraints by minimizing cross-entropy as a loss function and treating a single rule as a subsidiary base classifier (Dembczyński et al. 2009). Moreover, Dembczyński et al. (2010) present a method for learning rule ensembles for multiple criteria ranking problems, and Kadziński et al. (2004) consider multiple criteria ranking and choice with all compatible minimal cover sets of decision rules.
-
Inspired by the recent success of deep learning, MCDA has been combined with artificial neural networks. For example, the latter are used by Martyn and Kadziński (2023) to infer the parameters of threshold-based sorting procedures for various types of aggregation functions: the OWA aggregation operator, an additive value function, the Choquet integral, TOPSIS-based distances, and NetFlowScore procedures based on the principles of either PROMETHEE or ELECTRE.