Skip to content

DOC: Learning parameters for Discrete Bayesian Networks or Hybrid Bayesian Network #805

@NianzuMa

Description

@NianzuMa

Issue with current documentation:

Dear PyMC Community,

My task

I have a real world project in the following scenario:

I have a predefined network skeleton in DAG format. Our dataset has 1654 nodes, 2965 edges. I also have a dataset with shape (3000, 1654). Among the 1654 nodes, there are 605 continuous variables and 1049 categorical variables.

My goal is to learn parameter using this dataset for the predefined network skeleton. Then I want to do manipulation (hard intervention, do operation) for one or several variable by setting it/them to different values and then check how the posterior probability of the target variable change. This process is called causal effect estimation in our task.

My progress

My investigation and success experiment using pgmpy
I have successfully used SMILE and pgmpy package to achieve this goal. In the first step, I discrete the continuous values into 5 bins, so that the continuous variables can be converted to discrete variables. In this way, the network is a complete discrete/categorical network.

See example of using EM algorithm in pgmpy to learn parameter for a discrete network:

I have include my example in python notebook using pgmpy for your reference.
pgmpy_discrete_network_do_manipulation.ipynb

I feel stuck using PyMC
discrete_network_example___using_PyMC.py

I can make PyMC get the probability for P(B), P(A|B), P(T|A, B). However, I cannot find a way to print out the marginalized probability of each variable in the generalized way for P(A), P(B), P(T).
pgmpy has the function to directly print out P(A), P(B) and P(T). However, I cannot find a way to do it in PyMC.

My questions

(1)
I want to explore PyMC and used it to achieve the same goal as pgmpy did, i.e. given a predefined discrete network skeleton, learn parameter from a dataset for this network.

However, I did not see any official python notebook example for this task.
Is PyMc capable of achieve this task? Thanks.

(2)
is PyMc able to learn parameter from a dataset for a predefined hybrid network structure (a network with both discrete variables and continuous variables)?

Thank you very much for your answer.

Idea or request for content:

If PyMc support the task I described above (either discrete network or hybrid network), I would appreciate the community can write a simple tutorial for beginners, it would be very helpful. Thank you very much.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions