... | ... | @@ -7,7 +7,7 @@ The task of training a Bayesian network is thus split into two subtasks: |
|
|
* Finding the structure of the Bayesian network.
|
|
|
* Parametric learning of the Bayesian network or, in other words, selection of marginal and conditional distributions that describe the conditional ones accurately enough.
|
|
|
|
|
|
# Structural learning algorithms for a Bayesian network
|
|
|
# Structural learning algorithms
|
|
|
Often the task of constructing a network is reduced to optimization. In the DAG space, score functions are introduced that evaluate how well the graph describes the dependencies between features. Web BAMT uses the Hill-Climbing algorithm to search in this space.
|
|
|
Steps of Hill-Climbing algorithm:
|
|
|
1. Initialized by a graph without edges;
|
... | ... | @@ -19,7 +19,7 @@ Steps of Hill-Climbing algorithm: |
|
|
The following score functions from [BAMT package](https://github.com/ITMO-NSS-team/BAMT) are included in Web BAMT: K2, BIC (Bayesian Information Criterion), MI (Mutual Information).
|
|
|
The above approach to structure search allows to introduce elements of expert control by narrowing the search area to structures that include expert-specified edges or fixed root nodes describing key and basic features.
|
|
|
|
|
|
# Parametric learning algorithms for the Bayesian network
|
|
|
# Parametric learning algorithms
|
|
|
Parametric learning of distributions is performed by the method of likelihood maximization in a fixed class of distributions. In classical conditional Gaussian Bayesian networks, multinomial discrete distributions are used to describe discrete features and Gaussian distributions are used to approximate continuous ones.
|
|
|
One of the extensions to this basic model available in Web BAMT is the application of a multinomial mixture of Gaussian distributions.
|
|
|
BIC and AIC criterion-based approach is used to determine the number of mixture components. Parameter learning of such a model is also done by the method of likelihood maximization. Due to the large number of unknowns, an EM algorithm is used to find the model parameters, which consists of two steps: an estimation step in which we estimate the posterior probabilities of the mixture, and a maximization step in which we recalculate the mixture parameters to maximize the posterior probabilities.
|
... | ... | |