# Thread: Probability function of many variables

1. I have a problem in maths.

Suppose we have two magicians, Micky and Macky.
Micky tells the truth, when asked a question, with probability x and Macky tells the truth with probability y.
When Micky and Macky are asked about the state of a certain electrical switch they are both affirmative and they say state "A" is the state at which our proverbial switch is right now.
What is then the probability p(A) ?

A simple answer is they are either both telling the truth or both lying.
So:

P(A) = x.y / [ x.y + (1-x).(1-y) ]

But what happens if there is some kind of unknown correlation between the two of them ?
Macky is sometimes influenced by Micky's decision to tell the truth or to lie, in a certain peculiar way.
We can observe the final outcome i.e. measure p(A) but we don't know the inner reasoning.
Or perhaps we can work it out but what if we have many magicians m1, m2, m3 ... and cannot hope to make as many measurements as may be needed ?

Can someone help me with this ?
I am trying to google out some answer but I don' t seem to be able to put the right catch phrases.

p.s. perhaps I will think of some more representative example of the statitical proble I 'm trying to solve and come back, if the above does n't help.  2.

3. Surely you would need to define the 'influence' between the two more precisely before you could make use of it.

Either come up with a set probability for macky being influenced my micky, or state which circumstances the influence would take place in, and try to define an pattern/relationship for the 'extent' of the influence in certain different circumstances....some sort of dynamic probability based on circumstances. An algorithm could be made for it?

Thats my speculation anyway If this were a real life scenario however, I wouldn't waste any more time trying to define human behavior with maths... as Einstein said 'Gravitation is not responsible for people falling in love' lol  4. Let me state the problem otherwise.

Suppose we have a weather prediction system called "A" that predicts states of weather with a certain known accuracy.
The indication "sunny" has probability P1, the indication "rain" has probability P2, the indication "overcast" P3 and so on.
The ultimate reliability of our system can be expressed by an information sum:

I = SUM OF log(Pi)

where Pi are the probabilities of what transpired to be the truth after the event, as recorded before the event (the smallest the value of I the better).

Suppose now we have a second system "B", with characteristic probability coefficients Q1, Q2, Q3 ...
The sum of logs I = SUM OF log(Qi) has another value, perhaps worse than that from system "A", perhaps better.

We could have a third system "C" with its own properties and a fourth and so on.

The question is how can we use all of our systems to improve the final result ?
If we attempt cross measurements it may give us the answer.
So w.r. to state "1" of our property (the weather) we have to measure the statistics of system "A", system "B", system "C" ... simultaneously.
The number of measurements required is multiplied every time we add a system and it can get hopelessly big.
How then can we approximate a solution ?  5. You're interested in the notions of independence and conditional probability here. As you stated, the first magician's answer may somehow influence the second magician's answer. We cannot know what this effect is if we are only given the probability that each says yes.

Now, in your simple answer, you're assuming that the events are independent--that is, the probability that event B occurs is the same whether A happens or not. Mathematically, we define this to be:

P(A and B) = P(A)P(B)

To illustrate this idea more, we should introduce the idea of conditional probability: the conditional probability of A given B is the probability that A happens assuming that B happens. This idea allows us to account for the fact that A may happen more often when B occurs. We denote the conditional probability of A given B by P(A|B), and it shouldn't be too hard to see that we have this equality:

P(A|B) = P(A and B)/P(B)

We could even take this as a definition. But the intuitive idea is that we're dividing the percent of the time that A and B occur by the percent of the time that B occurs to get the percent of the time that A occurs when B occurs.

So now independence takes on a new meaning: A and B are independent if and only if P(A and B) = P(A)P(B) if and only if P(A|B) = P(A)--i.e., that the probability of A occurring given that B occurs is just the probability of A occurring.

With regards to you question, if we're just given P(A) and P(B), we have no idea of figuring out what P(A and B) is without some extra information. I can come up with events A, A', B, B' such that P(A) = P(A'), P(B) = P(B'), but P(A and B) ≠ P(A' and B'). For example, let A = A' = B, B' = not A, and assume P(A) = 1/2. Then P(not A) = 1/2, P(A and A) = 1/2, and P(A and not A) = 0.  6. So you have a number of systems and a number of outcomes, such that each system predicts the probability of each outcome?

Once each system has assigned its probabilities to each outcome, the event takes place, and one of the outcomes is known to be true.

Surely then, the most accurate system is the one that has the highest probability for that outcome? Repeated over time, the most consistent (reliable) system would be the one with the greatest number of 'most accurate' outcomes.

One way or another you would have to represent these 'consistencies' numerically. Perhaps a percentage of the total number of tests?

Next time each system makes it's predictions, you will again have a set of probabilities for each outcome.

Now multiply each probability by the 'consistency' percentage for each system, add them together (corresponding outcomes from different systems), and divide by the total number of systems? Would that work?

Ok, I'm not entirely sure about that last part, but if you want a combined result of each system, surely you would have to have some dynamic (changes after each test) value for each system to represent its consistency, and use this to determine how much of an influence each systems result will have on the final 'combined' result?

Is that what you're getting at? I don't understand the term 'information sum'.  7. ...Ok I missed serpicojr's answer there... take his advice over mine though, he is the maths wizzz   8. Let me try a different angle to illuminate the situation.

Suppose we have three mutually exclusive events denoted by A, B, C.
Our statistician tells us A has probability 50%, B has probability 30% and C has probability 20%.
The theoretical information is

I1 = - 0.5 * log(0.5) - 0.3 * log(0.3) - 0.2 * log(0.2) = 1.029653

But another statistician tells us, about the same things, that A has probability 40%, B has probability 40% and C has probability 20% (statisticians never agree, like lawyers).

His theoretical information value is

I2 = - 0.4 * log(0.4) - 0.4 * log(0.4) - 0.2 * log(0.2) = 1.05492

We don't quite know why the two guys tend to differ somewhat in their estimates. They just do.
Also it looks like the second one is going to fare worse (if the probability experiment is repeated infinite times), because theoretically he is closer to uncertainty (I2 > I1).
But that's only theory. In real life things can be reversed.

We are not allowed to count all the possible states of matter and the respective outcomes.
That might require in our example 101 x 101 x 101 x 101 points of observation (the numbers 0 to 100 four times), plus the huge number of readings (experiments) needed to obtain meaningful statistics.
We do nevertheless have a certain volume of experimental data to make use of (otherwise all this talk does n't make sense).

There exist engineering methods to use both sets of readings to improve prediction quality.
The objective is to make the I value as small as we can (the "experimental" I that is - what we can measure).
A simple thing one might be tempted to do is to say

True probability = a x p1 + (1-a) x p2

where a is an experimental constant.

This formula conserves normalization of probabilities (sum of probabilities = 1) but something more elaborate is needed I reckon.
I know it works, I 've seen it, but I lack understanding of the theory.  9. I have to admit I didn't really read your second post carefully before posting my answer, and I apologize for that. Now that I've read it (and your latest post), I don't have much to say immediately... I'll think about it, though.  10. Originally Posted by serpicojr
I have to admit I didn't really read your second post carefully before posting my answer, and I apologize for that. Now that I've read it (and your latest post), I don't have much to say immediately... I'll think about it, though.
I know it's a little tricky.
I was rather looking forward to some key words suitable for google.
Some papers that turn up are Greek to me.  Bookmarks
 Posting Permissions
 You may not post new threads You may not post replies You may not post attachments You may not edit your posts   BB code is On Smilies are On [IMG] code is On [VIDEO] code is On HTML code is Off Trackbacks are Off Pingbacks are Off Refbacks are On Terms of Use Agreement