# Combining propositional logic with maximum entropy reasoning on probability models

发布时间:2021-11-29 05:54:58

Combining Propositional Logic with Maximum Entropy Reasoning on Probability Models

Manfred Schramm1

and

Stephan Schulz2

based on the probability calculus. This calculus incorporates this type of reasoning in two ways: Non-monotonic decisions (which can be treated as decisions under incomplete knowledge as well) can be the result of reasoning in a single probability model (via conditionalization) or in a set of probability models (via additional principles of rational decisions). But probability theory is too ne-grained to model common sense reasoning in general (think about "paradoxes" due to the unexpected existence of certain P-Models ( 2, 16])). The remaining degrees of freedom have to be lled (of course without introducing subjective biases). We therefore use additional (context-sensitive) constraints (resp. principles), which are able to support rational decisions based on incomplete knowledge. These principles have to be global (context-dependent on all assumptions) to avoid loosing the sensitivity of the language to the assumptions. The central principle of rational decisions used by our system is the method of Maximum Entropy (MaxEnt), which is a well founded extension of probability theory with global properties ( 12, 10, 8, 17]). Propositional logic is a well-researched area of science. It allows the speci cation of many kinds of exact knowledge and has applications in as diverse elds as machine learning, integrated circuit design and mathematical group theory. The interest in these areas has led to the development of e cient decision procedures for propositional logic and mature implementations. However, propositional logic lacks the features necessary for modeling most real world situations and common sense reasoning. Consequently, such attempts are relegated to introductory textbooks. There are two particular features missing from propositional logic for modeling common sense reasoning: The ability to describe uncertainty and the ability to support plausible reasoning under incomplete knowledge. In this paper we will deal with this issues by mating propositional logic with reasoning on probability models. To achieve this goal we allow the propositional assumptions to be accompanied by probabilities (or probability intervals) to specify a range of certainty. We also use classical conditional probability statements to model the common sense conjunctive if-then. We call such a (conditional or unconditional )

cisstra e 20, D-80290 Munchen, Germany, +49/89/521096, schramma@informatik.tu-muenchen.de 2 ditto, schulz@informatik.tu-muenchen.de

1 Institut fur Informatik, Technische Universitat Munchen, Ar-

Abstract. We present a system for non-monotonic reasoning

1 Introduction

probability statement a constraint, and a speci cation in our approach consists of a set of these constraints. For any set of constraints we can compute a unique probability model (or P-Model) over the interpretations of the propositional variables (or, as they are also called, possible worlds), that is, we can compute a probability for each possible assignment of truth values to the propositional variables. We achieve this by searching for the P-Model with maximum entropy that is consistent with the set of constraints. Given a complete P-Model, we can compute the probability of an arbitrary propositional sentence by summing the probabilities of the interpretations which evaluate the sentence to true. Similarly, we can calculate the probabilities of conditional queries. The search for the P-Model with maximum entropy (the MaxEnt-Model) is expensive, as the number of interpretations grows exponentially with the number of propositional variables. However, we can use known properties of the MaxEntModel to reduce the search space. In particular, we can eliminate interpretations that are inconsistent at a propositional level, and we can apply the principle of indi erence to represent sets of equivalent interpretations by a single element. We have implemented these ideas in the PIT system for reasoning with uncertain and incomplete knowledge. This paper is organized as follows: After this overview, we will give a more formal introduction to probability models on propositional interpretations. We will then describe and justify the maximum entropy principle and touch its connections to the principle of indi erence and independence. The next section deals with the e cient calculation of the MaxEnt-Model for a given set of constraints. After that, we will describe our implemented system, PIT. We will conclude with an example of common sense reasoning with the system. This example also illustrates how default reasoning can be implemented in our approach.

2 Propositional Logic and P-Models

In this section we will give a short introduction to the concepts of probability models on propositional interpretations. For a more formal description we refer to 20] or any introduction to reasoning with probabilities. Let V = fx1 : : : xn g be a nite set of propositional variables (atoms), and let Form(V ) be the set of propositional formulas in these variables (using e.g. the operators f:; _; ^; !g for not, or, and and the material implication). A literal is either an atom or a negated atom. An interpretation of the variables

is a mapping I : V ! ftrue, falseg. An interpretation can be extended to a function I : Form(V ) ! ftrue, falseg by the usual rules for the application of the operators. We also call an interpretation a possible world and denote it by the unique full conjunction3 of literals that evaluates to true under it. A propositional speci cation S is a set of formulas, and an interpretation that evaluates all formulas from S to true is called a propositional model of S . Now let = f!1 ; : : : ; !m g be a nite set of disjoint elementary events (i.e. only one of the !i can occur at any one time or situation). A probability measure on is a function P : ! 0; 1] with P (!) = 1

X

!2

The probability measure is continued to a function P : 2 ! 0; 1] by P (A) := P (!) for A

X

!2A

A linear probabilistic constraint (or simply constraint from here on) is a statement of the form P (A) 2 J or P (AjB ) 2 J , where A;B and J is a subinterval of 0; 1] . 4 A probability model (P-Model) for a set of constraints DB is a probability measure which ful lls the constraints from DB . We now chose the interpretations for a set of propositional variables as the elementary events of a probability measure. We can then identify a propositional formula with the set of its (propositional) models. Thus, probabilistic constraints can be written as (conditional) probabilities on propositional formulas. Probabilistic constraints with the probability 0 or 1 still act like purely propositional statements. We call them logical constraints. If we have a unique P-Model for a set of constraints, we can easily compute the probability of a proposition by summing the probabilities of its models. We can also calculate a conditional probability P (ajb) by comparing the total probability of the models of a ^ b to the probability of models of b (a; b 2 Form(V )). Unfortunately, in most cases the constraints of a speci cation do not, without further principles, determine a unique P-Model, but rather a (in nite) set of PModels. To select a single, representative model from this set we apply the principle of maximum entropy.

If the set of P-Models is generated by linear inequalities on , as in our case, this choice is known to lead to an unique P-Model P (e.g. 14]) In addition to explaining the connection between H (P ) and the amount of information in the distribution P , MaxEnt can be justi ed via stating a set of plausible axioms of inductive reasoning and deriving MaxEnt as unique consequence of this axioms ( 17, 13, 22]), or with more quantitative arguments like the Concentration Theorem or the Wallis Derivation of entropy (see 11, 12, 9] for detailed descriptions). These quantitative arguments observe that for discrete models of the probabilities (e.g. urns with a given number N of balls), where the probability mass of a distribution can be calculated by classical discrete (combinatorial) methods, the mass of the distributions concentrates around the MaxEnt distribution P if we re ne the model (by increasing N ). We now make use of MaxEnt in two di erent forms, whose relation can again be obtained from Fig. 1. The weaker form (less conclusions are supported) we call worst case analysis. Here we choose the values in the intervals rst and apply MaxEnt afterwards. A conclusion is only supported if MaxEnt accepts the conclusion for any possible choice of the values in the intervals. The stronger form (more conclusions are supported) is also the simpler form: just take the P-Model with maximal entropy, where the constraints may range in the speci ed intervals.

P (A) with A

3 The Maximum Entropy Principle

The maximum entropy principle is a well known principle to choose a certain (preferred) probability model from an (in nite) set of possible P-Models. ( 10, 14, 17, 5]). It solves this task without introducing arbitrary and unnecessary information or personal bias by the choice. Formally, the principle of maximum entropy (MaxEnt) selects the P-Model with maximal entropy. Following 21], the entropy for a given P-Model P over is de ned as

H (P ) = ?

!2

X P (!)ln(P (!))

Solving the MaxEnt-Problem (i.e. calculating the MaxEntModel for an under-constrained probabilistic problem) is not a new idea (see e.g. 7]). It is also well known that the MaxEnt principle has connections to some basic form of the principle of indi erence, and that it also provides connections to the notion of independence. Basically, aspects of these properties can already be recognized in some axiomatic derivations of MaxEnt. Informally, we can state that MaxEnt maximizes indi erence and independence. By de ning di erent logics on P-Models by adding di erent axioms (Logic on P-Models with indi erence, Logic on P-Models with independence, Logic on P-Models with indi erence and independence, Logic with worst case analysis of MaxEnt-Models and Logic with MaxEnt, see Fig. 1 for an example of these logics and their logical relations; see 20] for a more extended description), these connections can be claried and used for reducing the search space of the optimization process. Both principles, indi erence and independence, can be used for this purpose. The following lines will shortly state our use of the principle of indi erence. The strong connections between MaxEnt and qualitative models of independence (as delivered by undirected graphs ( 18])) are not currently implemented in our system, because our main application (in a medical domain, see section 8) contains highly connected concepts. Therefore, the use of indi erence is more likely to lead to signi cant simpli cations than the use of independence.

4 MaxEnt, Indi erence and Independence

3 A full conjunction is a conjunction of literal where each atom from V appears either in atomic form or in negated form. 4 We write P (: : :) = p for the special case of J = p; p].

The Principle of Indi erence

The history of this famous principle goes back to Laplace and Keynes. Let us quote ( 10]) for a short and informal de nition

Logic on P-Models with Maximum Entropy

Logic on P-Models with MaxEnt with Worst-Case Analysis on given Constraints Less P-Models, more Conclusions

Logic on P-Models with Indifference and Independence

Logic on P-Models Propositional Logic

Figure 1.

Logics on P-Models

of this principle: \If the available evidence gives us no reason to consider proposition a1 either more or less likely as a2 , then the only honest way we can describe that state of knowledge is to assign them equal probabilities: P (a1 ) = P (a2 )." Certainly, this qualitative description leaves open many questions of its precise meaning, but it describes a more extended principle than the de nition of indi erence as assigning equal degrees of belief to all basic \situations" consistent with the knowledge base (compare e.g. 1]), which often is simpli ed to demanding that the equation P (!) = 1=j j holds if no constraint is present. The following de nition will sketch a method on how to decide whether the available evidence (i.e. a set of constraints) gives us reason to consider a proposition more or less likely than an other (see 20] for a more extended description). We de ne a possible world !1 to to be indi erent from !2 if a renaming (permutation) of all possible worlds exists, which does not change the set of P-Models, and if !1 and !2 are within the same cycle of this permutation. As consequence of detecting two possible worlds !1 ; !2 as indi erent, we demand P (!1 ) = P (!2 ) as an additional constraint. It can be shown that this additional constraint does not change the resulting MaxEnt-Model.

We search for the MaxEnt-Model by applying a numerical optimization algorithm. It can be shown that the optimization problem for the entropy function with linear constraints is very simple. The optimization searches in a convex space, and the second order derivations of the optimization problem are de ned, and can be calculated e ciently. Therefore, e cient algorithms (developed by mathematicians) for the task already exist. We decided to use an implementation of the SQP (Sequential Quadratic Programming, see 23]) algorithm and an implementation of the KOMPLEX-Algorithm for the (more di cult) worst case analysis5( 3]). Corresponding to the theoretical connections of the di erent logics in 1, we know explain how propositional constraints and the principle of indi erence does help for an e cient calculation of the MaxEnt-Model by reducing the search space for the optimization process. 1. The rst step consists of reducing the set of propositional interpretations by checking for logical constraints. For the representation of this set we use Binary Decision Diagrams ( 4]), which allow an e cient representation for domains of the size we are interested in (less than 64 Boolean variables). If the set of assumptions is already inconsistent at the propositional level, the process stops here and gives appropriate feedback. Also, the percentage of possible worlds in which some propositional expressions are true can be obtained at this stage. 2. To further reduce the number of variables for the optimization process, we generate a table of the linear propositional constraints in some normal form, perform consistency checks (especially in the use of the intervals), and apply the principle of indi erence. To check for indi erent events algorithmically, we use sufcient conditions to detect e ciently some special permutations in the table of constraints that do not change the space of P-Models. If we nd some possible worlds to be indi erent, these are collected into an equivalence class by joining the corresponding columns of the table of constraints. By that operation, some rows (constraints) of the table become redundant (this does not indicate that the constraints can be omitted in the speci cation), and can be omitted (which also leads to a more e cient optimization). See section 7 for an example of the e ect. 3. At the next step, the simpli ed optimization problem is passed to our subtools SQP and KOMPLEX. They continue in simplifying the problem (e.g. they detect linear dependencies between constraints) and nally they calculate the desired P-Model. To be more speci c, the worst case analysis is implemented as a second optimization process by using KOMPLEX, which tries to minimize the probability of a certain question by calling the MaxEnt optimization (with SQP) with varying values from the intervals.

5 E cient Calculation of the MaxEnt-Model

6 Description of the PIT System

problem, and thus cannot be solved by SQP.

5 Contrary to the MaxEnt problem, this is usually not a convex

Fig 2 shows the central components of our system PIT (for Probability Induction Tool):

Modelling Team

Expert Knowledge (Propositional rules with (Interval) Probabilities)

Feedback

User

(Goal) Answers Queries

Compiler

Compiled Queries

Query Tool

Constraints on P-Models Worst Case

(Compiled Goal)

Analysis?

NO

Unique P-Model

Unique P-Model

YES

Constraints

KOMPLEX

Unique MaxEnt-Model

SQP

calculate answers for the user (compare the top entry in Fig. 1). { Otherwise, the constraints are handed to KOMPLEX, and KOMPLEX searches for the unique MaxEnt P-Model that minimizes the probability of a single goal and is still consistent with the given constraints. In this case KOMPLEX uses some (user-de ned) constraints and generates new point constraints from them. It then uses SQP to calculate the MaxEnt-Model for these point constraints and the remaining, xed constraints. The model nally resulting from this search is then used to calculate answers as above. This realizes the logic of the second entry (Logic on P-Models with Worst-Case Analysis on given constraints) of Fig. 1. The task of the Query Tool is to save the correlation between the events and variables (for the numerical optimization) and to use this information for calculating the answer to the queries after the optimization has found a model. Finally, the user speci es questions, receives the answers and gives feed-back to the modeling team. Of course, during the modeling phase the team also assumes the role of the user and can use the feedback directly. We will explain reasoning with our system for an example from Lifschitz's collection of non-monotonic benchmarks (The example can be found as B4 in 15]). Consider the following colloquial speci cation:

Optimization Control

7 An Example for Reasoning with PIT

PIT Expert Knowledge Goal for Worst Case Analysis (Optional) Queries, Answers and User Feedback

Figure 2.

The PIT System and its Environment

The task of the modeling team is to specify the knowledge of an expert as conditional probabilities ranging on intervals. Probabilities can be speci ed for arbitrary propositional expressions. The common-sense implication (if-then) is modeled as a conditional probability (and not as the material implication). The task of the compiler is to lems. If the problem is a purely propositional one (all probabilities are 0 or 1), the compiler already calculates the answers to the queries. { translate the symbolic input into tables of linear constraints (for KOMPLEX and SQP) to be used as input for the optimization tool. { simplify the table of linear constraints by detecting indi erent constraints and indi erent events. Our experiments show that indi erence is indeed a very powerful principle to reduce the complexity of the representation. The optimization control component governs the use of the subroutines KOMPLEX and SQP:

(a1) (a2) (a3) (a4) (a5)

Quakers are normally paci sts. Republicans are normally hawks. Paci sts are normally politically active. Hawks are normally politically active. Paci sts are not hawks.

We will show that the following common sense conclusions hold under our system:

{ make consistency checks and prove propositional prob-

(c1) Quakers who are not republicans are normally] pacists. (c2) Republicans who are not Quakers are normally] not paci sts. (c3) Quakers, Republicans, Paci sts and Hawks are normally] politically active. We will rst translate the assumptions into probabilistic constraints. Please note that the assumptions can be read as default statements. Statement (a1), for example, can be interpreted as In the absence of other evidence, it is reasonable to assume that a Quaker is a paci st. It can, of course, be formalized in many ways. However, at least to our understanding, it requires that for an arbitrary Quaker the probability that he is a paci st is greater than 0.5. We will therefore chose the weakest possible formalization for the term normally and only require that the corresponding probability ranges in the interval (0:5; 1]6 .

6 Regardless of the exact subinterval of (0:5; 1] we chose, we can

{ If no worst-case analysis is desired, SQP generates the

unique P-Model with maximum entropy that is consistent with the constraints. This P-Model is then used to

show that the conclusions hold within our approach. One (mathematical) argument for using this interval (especially the lower boarder of 0.5 ) is that this is the only possibility to

So we express defaults by very weak conditional probability constraints using the largest plausible intervals. This representation is strong enough to solve many benchmarks of nonmonotonic reasoning (e.g. benchmarks from 15]) even if we perform a worst-case analysis on the intervals. We chose conditional probabilities to model the conditional statements in both assumptions and conclusions (Note that statement (a1) is equivalent to the conditional statement If somebody is a Quaker, then he normally is a paci st). This is a well-known and widely used convention (compare e.g. 6]). To represent the ve concepts Quaker, Paci st, Republican, Hawk and Politically active we chose the set of variables V = fQu; Pa; Re; Ha; Pog. With these decisions, we can restate the problem in our modeling language:

(a1) (a2) (a3) (a4) (a5)

P (PajQu) 2 (0:5; 1]7 P (HajRe) 2 (0:5; 1] P (PojPa) 2 (0:5; 1] P (PojHa) 2 (0:5; 1] P (:HajPa) = 1

To show the desired conclusions, we have to demonstrate that the following constraints hold in the resulting P-Model:

P (!) = 0:045087 for ! 2 f:Qu ^ :Re ^ Pa ^ :Ha ^ :Po; :Qu ^ :Re ^ :Pa ^ Ha ^ :Pog P (!) = 0:045081 for ! 2 f:Qu ^ :Re ^ Pa ^ :Ha ^ Po; :Qu ^ :Re ^ :Pa ^ Ha ^ Pog P (!) = 0:027859 for ! 2 fQu ^:Re ^:Pa ^:Ha ^Po;Qu ^ :Re ^:Pa ^:Ha ^:Po; :Qu ^ Re ^:Pa ^:Ha ^ Po; :Qu ^ Re ^ :Pa ^ :Ha ^ :Pog P (!) = 0:072951 for ! 2 fQu^:Re^Pa^:Ha^:Po; :Qu^ Re ^ :Pa ^ Ha ^ :Pog P (!) = 0:072943 for ! 2 fQu ^:Re ^ Pa ^:Ha ^Po; :Qu ^ Re ^ :Pa ^ Ha ^ Pog P (!) = 0:027866 for ! 2 fQu^:Re^:Pa^Ha^:Po; :Qu^ Re ^ Pa ^ :Ha ^ :Pog P (!) = 0:027876 for ! 2 fQu ^:Re ^:Pa ^ Ha ^Po; :Qu ^ Re ^ Pa ^ :Ha ^ Pog P (!) = 0:017216 for ! 2 fQu ^ Re ^:Pa ^:Ha ^ Po; Qu ^ Re ^ :Pa ^ :Ha ^ :Pog P (!) = 0:045085 for ! 2 fQu ^ Re ^ Pa ^:Ha ^:Po; Qu ^ Re ^ :Pa ^ Ha ^ :Pog P (!) = 0:045090 for ! 2 fQu ^ Re ^ Pa ^:Ha ^ Po; Qu ^ Re ^ :Pa ^ Ha ^ Pog

From this MaxEnt-Model we gain the following data about the assumptions. assumption (a1) in the MaxEnt-Model is the lowest one compatible with the original constraint. (a2) P (HajRe) = 0:500001(= 0:5 + ). (a3) P (PojPa) = 0:500001(= 0:5 + ). (a4) P (PojHa) = 0:500001(= 0:5 + ). (a5) P (:HajPa) = 1. Remember that (a5) is a logical constraint and there is no freedom to choose another probability here. As one can see, the MaxEnt-Model tries to be as uncommitted about the assumptions as the speci cation allows (i.e. it chooses a probability for the assumptions that provides the minimum amount of information). In this case, as is to be expected, the model is found at the (lower) border of the intervals. Nevertheless, all the desired conclusions are supported in it:

(c1) P (PajQu ^ :Re) 2 (0:5; 1] (c2) P (:Paj:Qu ^ Re) 2 (0:5; 1] (c3) P (PojQu _ Re _ Pa _ Ha) 2 (0:5; 1]8

As we have 5 propositional variables, the set of elementary events contains 25 = 32 possible worlds. The logical constraint (a5) eliminates 8 of these interpretations as impossible (i.e. their probability is 0). By applying the principle of indi erence (compare section 5), we can restrict the number of equivalence classes to 11. (Roughly stated, the principle of indi erence shows the symmetry between (Qu ^ :Re) and (:Qu ^ Re) as well as the symmetry between (Pa ^:Ha) and (:Pa ^ Ha) ) . The optimization process on the resulting 11 variables yields the following approximation of the MaxEnt-Model:

(a1) P (PajQu) = 0:500001(= 0:5 + ). The probability of

(Remark: These are the possible worlds eliminated by the logical constraint (a5), i.e. all worlds that allow a Paci st to also be a Hawk. They have already been detected at the compiler stage, of course with are more e cient method than enumerating them). P (!) = 0:045086 for ! 2 f:Qu ^ :Re ^ :Pa ^ :Ha ^

P (!) = 0 for ! 2 fQu ^ Re ^ Pa ^ Ha ^ Po; Qu ^ Re ^ Pa ^ Ha ^:Po;Qu ^:Re ^ Pa ^ Ha ^ Po;Qu ^:Re ^ Pa ^ Ha ^:Po; :Qu ^ Re ^ Pa ^ Ha ^ Po; :Qu ^ Re ^ Pa ^ Ha ^ :Po; :Qu^:Re^Pa^Ha^Po; :Qu^:Re^Pa^Ha^:Pog

(c1) P (PajQu ^ :Re) = 0:566900. (c2) P (:Paj:Qu ^ Re) = 0:783404. (c3) P (PojQu _ Re _ Pa _ Ha) = 0:500001.

The probability for all three conclusions falls into the interval (0:5; 1], ergo all three conclusions hold under our modeling assumptions. As we already noted, similar results are obtained for other choices of the intervals and even the worst case analysis (choose the values in the intervals rst and apply MaxEnt afterwards) does not change the result, because the MaxEntModel already represents the worst case for this example, that is, it chooses exactly the probabilities of the assumptions that minimize the probabilities of the three conclusions. We have shortly described our concept and implementation for reasoning with incomplete and uncertain (here: probabilistic) knowledge. Based on propositional logic, the speci cation

Po; :Qu ^ :Re ^ :Pa ^ :Ha ^ :Pog

express assumptions and conclusions within the same interval. In most examples of this type of inductivereasoning, the conclusions show less certainty than the premises, so increasing the lower boarder of the interval of the assumptions does not increase the lower boarder of the conclusion by the same size. 7 In the implemented system a half-open interval (a; b] is realized as the closed interval a + ; b], where is of the same order of magnitude as the desired overall accuracy. 8 Note that the sentence (c3) is equivalent to the conditional If somebody is a Quaker or a Republican or a Paci st or a Hawk, then he (normally) is politically active. Thus the formalization uses the disjunction _ to represent the colloquial and of the original sentence.

8 Conclusion and future work

can contain logical constraints, constraints with xed probabilities or constraints with probabilities ranging in intervals. This very general features can also be used for qualitative non-monotonic reasoning. Most non-monotonic benchmarks (e.g. those from 15] using unary predicates) can be modeled within the very weak formalization we described above. 9 For better judgments we still look for more complex examples, that are still well understood in their desired conclusions (contrary to statistical reasoning on complex domains). In our actual work we also trying to specify an application for a medical purpose within the ESPRIT Basic Research Project 9119 MIX , based on 24 propositional variables and about 150 probabilistic constraints. Our future work will also include improvements of the subtools. We will try to make use of the principle of independence in the compiler to reduce the complexity of the optimization problem. We will try to replace the SQP optimization tool with a more specialized version that makes use of the special properties of our problem and thus is better equipped to deal with our expected standard applications. Finally, we will replace the KOMPLEX algorithm for the worst-case analysis with an improved algorithm that is under development in an ongoing Ph.D. thesis.

9] 10] 11] 12] 13] 14] 15] 16]

REFERENCES

1] F. Bacchus, A.J. Grove, J.Y. Halpern, and D. Koller, `From Statistics to Beliefs', in Proc. of the 10th National Conference on Arti cial Intelligence. AAAI, AAAI Press/MIT Press, (1992). 2] C.R. Blyth, `On Simpson's Paradox and the Sure-Thing Principle', Journal of the American Statistical Association, 67(338), 363{381, (June 1972). 3] M.J. Box, `A New Method of Constrained Optimization and a Comparison with Other Methods', The Computer Journal, 8, 42{52, (1965). 4] R.Y. Bryant, `Symbolic Boolean Manipulation with Ordered Binary Decision Diagrams', Technical Report CMU-CS-92-160, School of Computer Science, Carnegie Mellon University, (1992). Available at http://www.cs.cmu.edu/Web/Reports/1992.html. 5] P. Cheeseman, `A Method of Generalize Bayesian Probability Values for Expert Systems', in Proc. of the 8th International Joint Conference on Arti cial Intelligence (IJCAI-83), Karlsruhe, volume 1, pp. 198{202, San Mateo, CA, (1983). Morgan Kaufman. 6] E.W. Adams, `Probability and the Logic of Conditionals', in Aspects of Inductive Logic, eds., J. Hintikka and P.Suppes, North Holland, Amsterdam, (1966). 7] S.A. Goldman and R.L. Rivest, `A Non-Iterative Maximum Entropie Algorithm', in Uncertainty in Arti cial Intelligence, eds., J.F. Lemmer and L.N. Kanal, volume 2, 133{148, Elsevier Science, (1988). 8] M. Goldszmidt, P. Morris, and J. Pearl, `A Maximum Entropy Approach to Non-Monotonic Reasoning', in Proc. of but it is known that this rule causes trouble (consider e.g. the Lottery Paradox as described in 19]). Furthermore, as MaxEnt is based on combinatorial calculations, the result of MaxEnt has some meaning independentfrom proposed rules of non-monotonic reasoning. If it contradicts a proposed rule, it also calls into question this rule, or to be more concrete, the rule will not be useful for reasoning on proportions.

17] 18] 19]

20] 21] 22] 23]

the Eighth National Conference on Arti cial Intelligence, Boston, pp. 646{652. AAAI, AAAI Press/MIT Press, (1990). A.J. Grove, J.Y. Halpern, and D. Koller, `Random Worlds and Maximum Entropy', in Proc. of the 7th Annual IEEE Symposium on Logic in Computer Science (LICS). IEEE, (1992). E.T. Jaynes, `Where do we stand on Maximum Entropy?', in Papers on Probability, Statistics and Statistical Physics, ed., R.D. Rosenkrantz, 210{314, Kluwer Academic Publishers, (1978). E.T. Jaynes, `On the Rationale of Maximum Entropy Methods', Proc. of the IEEE, 70(9), 939{952, (1982). E.T. Jaynes, `Probability Theory: The Logic of Science'. Continually updated, fragmentary edition of approximately 700 pages available from ftp://bayes.wustl.edu//pub/jaynes, 1995. J.E. Shore and R.W. Johnson, `Axiomatic Derivation of the Principle of Maximum Entropy and the Principle of Minimum Cross Entropy', IEEE Transactions on Information Theory, IT-26(1), 26{37, (1980). J.N. Kapur and H.K. Kesavan, Entropy Optimization Prinziples with Applications, Academic Press, 1992. V. Lifschitz, `Benchmark Problems for formal Non-Monotonic Reasoning', in Non-Monotonic Reasoning: 2nd International Workshop, ed., Reinfrank et al, volume 346 of LNAI, pp. 202{ 219. Springer, (1989). E. Neufeld and J.D. Horton, `Conditioning on Disjunctive Knowledge: Simpson's Paradox in Default Logic', in Uncertainty in Arti cial Intelligence, eds., M. Henrion, R.D. Shachter, L.N. Kanal, and J.F. Lemmer, volume 5, 117{125, Elsevier Science, (1990). J.B. Paris and A. Vencovska, `A Note on the Inevitability of Maximum Entropy', International Journal of Approximate Reasoning, 3, 183{223, (1990). J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, San Mateo, CA, 1988. D. Poole, `What the Lottery Paradox Tells us about Default Reasoning', in Proc. of the 1st International Conference on Principles of Knowledge Representation and Reasoning, eds., R.J. Brachman, H.J. Levesque, and R. Reiter, pp. 333{340, San Mateo, CA, (1989). Morgan Kaufmann. M. Schramm and M. Greiner, `Foundations: Indi erence, Independence & Maxent', in Maximum Entropy and Bayesian Methods in Science and Engeneering (Proc. of the MaxEnt'94), ed., J. Skilling. Kluwer Academic Publishers, (1995). C.E. Shannon and W. Weaver, A Mathematical Theory of Communication, University of Illinois Press, Urbana, 1949. J. Skilling, `The Axioms of Maximum Entropy', in Maximum Entropy and Bayesian Methods in Science and Engineereing, eds., G.J. Erickson and C.R. Smith, volume 1 { Foundations, Kluwer Academic Publishers, (1988). S. Wagenpfeil, Implementierung eines SQP-Verfahrens mit dem Algorithmus von Ritter und Best, Diplomarbeit, Institut fur Angewandte Mathematik und Statistik, TU Munchen, 1991. (German Language).

9 An exception of this is the well known proposal of the And-rule,