As Dr. Gautam Shroff said, the interesting relationship between SQL and Bayesian Network remains an good area for more research. Here's an example based on a NYU AI course material, Pls. find my note of SQL statements all in underlines.
Bayesian Net Example (with SQL calculation) 
Consider the following Bayesian network:
A->C->E
A->D
B->D

Thus, the independence expressed in this Bayesian net are that 
A and B are (absolutely) independent. 
C is independent of B given A. 
D is independent of C given A and B. 
E is independent of A, B, and D given C. 
Suppose that the net further records the following probabilities:

Prob(A=T) = 0.3 
Prob(B=T) = 0.6 
Prob(C=T|A=T) = 0.8 
Prob(C=T|A=F) = 0.4 
Prob(D=T|A=T,B=T) = 0.7 
Prob(D=T|A=T,B=F) = 0.8 
Prob(D=T|A=F,B=T) = 0.1 
Prob(D=T|A=F,B=F) = 0.2 
Prob(E=T|C=T) = 0.7 
Prob(E=T|C=F) = 0.2
Some sample computations:
First, create database tables a(p,a), b(p,b), c(p,c,a), d(p,d,a,b), e(p,e,c) with all probs filled in, including inferred ones like: Prob(A=F) = 0.7, Prob(D=F|A=F,B=F) = 0.8, etc.    Prob(D=T)P(D=T) =

P(D=T,A=T,B=T) + P(D=T,A=T,B=F) + P(D=T,A=F,B=T) + P(D=T,A=F,B=F) =

P(D=T|A=T,B=T) P(A=T,B=T) + P(D=T|A=T,B=F) P(A=T,B=F) + 
P(D=T|A=F,B=T) P(A=F,B=T) + P(D=T|A=F,B=F) P(A=F,B=F) = 
(since A and B are independent absolutely)

P(D=T|A=T,B=T) P(A=T) P(B=T) + P(D=T|A=T,B=F) P(A=T) P(B=F) + 
P(D=T|A=F,B=T) P(A=F) P(B=T) + P(D=T|A=F,B=F) P(A=F) P(B=F) =

0.7*0.3*0.6 + 0.8*0.3*0.4 + 0.1*0.7*0.6 + 0.2*0.7*0.4 = 0.32

Using SQL:

SELECT SUM(a.p*d.p*b.p) FROM a, d, b WHERE  d.a=a.a AND d.b=b.b AND d.d='t'

Prob(D=F,C=T)P(D=F,C=T) =P(D=F,C=T,A=T,B=T) + P(D=F,C=T,A=T,B=F) + 
P(D=F,C=T,A=F,B=T) + P(D=F,C=T,A=F,B=F) =

P(D=F,C=T|A=T,B=T) P(A=T,B=T) + P(D=F,C=T|A=T,B=F) P(A=T,B=F) + 
P(D=F,C=T|A=F,B=T) P(A=F,B=T) + P(D=F,C=T|A=F,B=F) P(A=F,B=F) = 
(since C and D are independent given A and B)

P(D=F|A=T,B=T) P(C=T|A=T,B=T) P(A=T,B=T) + P(D=F|A=T,B=F) P(C=T|A=T,B=F) P(A=T,B=F) + 
P(D=F|A=F,B=T) P(C=T|A=F,B=T) P(A=F,B=T) + P(D=F|A=F,B=F) P(C=T|A=F,B=F) P(A=F,B=F) = 
(since C is independent of B given A and A and B are independent absolutely)

P(D=F|A=T,B=T) P(C=T|A=T) P(A=T) P(B=T) + P(D=F|A=T,B=F) P(C=T|A=T) P(A=T) P(B=F) + P(D=F|A=F,B=T) P(C=T|A=F) P(A=F) P(B=T) + P(D=F|A=F,B=F) P(C=T|A=F) P(A=F) P(B=F) =

0.3*0.8*0.3*0.6 + 0.2*0.8*0.3*0.4 + 0.9*0.4*0.7*0.6 + 0.8*0.4*0.7*0.4 = 0.3032

Using SQL:

SELECT SUM(a.p*c.p*d.p*b.p) FROM a, c, d, b WHERE c.a=a.a AND d.a=a.a AND d.b=b.b AND c.c='t' AND d.d='f'

Prob(A=T|C=T)P(A=T|C=T) = P(C=T|A=T)P(A=T) / P(C=T). 


Now P(C=T) = P(C=T,A=T) + P(C=T,A=F) = 
P(C=T|A=T)P(A=T) + P(C=T|A=F)P(A=F) = 
0.8*0.3+ 0.4*0.7 = 0.52

So P(C=T|A=T)P(A=T) / P(C=T) = 0.8*0.3/0.52= 0.46.

Using SQL:

SELECT a.a, SUM(a.p*c.p) FROM a, c WHERE c.a=a.a AND c.c='t' GROUP BY a.a

Remember to normalize P(A=T)

Prob(A=T|D=F)P(A=T|D=F) =P(D=F|A=T) P(A=T)/P(D=F).

Now P(D=F) = 1-P(D=T) = 0.68 from the first question above. 


P(D=F|A=T) = P(D=T,B=T|A=T) + P(D=F,B=F|A=T) =

P(D=F|B=T,A=T) P(B=T|A=T) + P(D=F|B=F,A=T) P(B=F|A=T) = 
(since B is independent of A)

P(D=F|B=T,A=T) P(B=T) + P(D=F|B=F,A=T) P(B=F) = 0.3*0.6 + 0.2*0.4 = 0.26. 
So P(A=T|D=F) = P(D=F|A=T) P(A=T)/P(D=F) = 0.26 * 0.3 / 0.68 = 0.115

SELECT a.a, SUM(a.p*d.p*b.p) FROM a, d, b WHERE d.a=a.a AND d.b=b.b AND d.d='f' GROUP BY a.a

Remember to normalize P(A=T)

Prob(A=T,D=T|B=F).P(A=T,D=T|B=F) =P(D=T|A=T,B=F) P(A=T|B=F) = 
(since A and B are independent)

P(D=T|A=T,B=F) P(A=T) = 0.8*0.3 = 0.24.

SELECT a.a, d.d, SUM(a.p*d.p*b.p) FROM a, d, b WHERE d.a=a.a AND d.b=b.b AND b.b='f' GROUP BY a.a, d.d

Remember to normalize P(A=T,D=T)

Prob(C=T | A=F, E=T)Prob(C=T | A=F, E=T) = (By Bayes' law)Prob(E=T|C=T,A=F) * Prob(C=T|A=F) / Prob(E=T|A=F) = (since E is independent of A given C)

Prob(E=T|C=T) * Prob(C=T|A=F) / Prob(E=T|A=F).

Now Prob(E=T|A=F) = Prob(E=T,C=T|A=F) + Prob(E=T,C=F|A=F) =

Prob(E=T|C=T,A=F) Prob(C=T|A=F) + Prob(E=T|C=F,A=F) Prob(C=F|A=F) = (since E is independent of A given C)

Prob(E=T|C=T) * Prob(C=T|A=F) + Prob(E=T|C=F) * Prob(C=F|A=F).

So we have 
Prob(C=T | A=F, E=T) = 
Prob(E=T|C=T) * Prob(C=T|A=F) / (Prob(E=T|C=T) * Prob(C=T|A=F) + Prob(E=T|C=F) * Prob(C=F|A=F)) =

0.7*0.4 / (0.7 * 0.4 + 0.2 * 0.6) = 0.7

SELECT c.c, SUM(a.p*c.p*e.p) FROM a, c, e WHERE c.a=a.a AND e.c=c.c AND a.a='f' AND e.e='t' GROUP BY c.c

Remember to normalize P(C=T)


In general, just use the whole structure:
for Prob(%Apo | %Obs) 

SELECT (%Apo), SUM(a.p*d.p*b.p*c.p*e.p) FROM a, d, b, c, e WHERE  d.a=a.a AND d.b=b.b and c.a=a.a and e.c=c.c (%Obs) group by (%Apo)




Leave a Reply.