Here scheme of the project structure |
\ No newline at end of file |
SAMPLE_SIZE=442 (max = 442 due to size of train dataset);<br>
CASE_ID=0 stands for geological dataset, CASE_ID=1 - for social dataset.<br>
PREPROCESSING (in this version it remains constant and not available for user to tune)<br>
First, we encode discrete data, then discretize continuous data. <br>
_The purpose of this action and more details can be found in section "About BAMT algorithms"._<br>
encoder = pp.LabelEncoder()<br>
discretizer = pp.KBinsDiscretizer(n_bins=5, encode='ordinal', strategy='quantile')<br>
Geological dataset:<br> has_logit=0,<br> use_mixture=0,<br> score_function="K2"<br><br>
Social dataset:<br> has_logit=1,<br> use_mixture=1,<br> score_function="MI" (different params to avoid isolated vertices)
There are 4 main and 1 additional modules implemeneted. Additional modules are tests, we use them only in dev stage.<br>
Each module follows this pattern:<br>
1. Controller. File with query
2. Service. File with core functions
3. Models. File with declarations of tables in database
4. Schema. File with docs.<br><br>
We have ordered the modules accoring their usage in main scenario.
## AuthMod
This module provides a communication between user and auth system.
### Controller
get_token(email, password): get token by email and password;<br>
signup(email, password): registrate user in database.<br>
### Models
Declare the tables related to auth system.
### Service
Here we defined functions to work with auth system.
## Example
Demonstarte an instanse of pretrained network.
### Controller
get_example(case_id): return a network pretrained with
## Experiment
One of the most important module in applictaion. It is responsible for training bayssian network, sample from it.
### Controller
get(owner, name, case_id, bn_params): validate input, train network, sample from it and save data.<br>
IMPORTANT NOTE: in this version we have to unify format of sample (_actually, there are 2 types: discrete and continuous_).
We descritize continouos data, bins equal to output of `np.histogram_bin_edges()` on train dataset for better comparison.<br>
get_root_nodes(case_id): return initial nodes from dataset.<br>
### Models
Declare tables with networks and samples.
### Service
Core functions to fit bayessian network and save them.
## bn_manager
Module provides operations with bayessian networks in database such as: find BN(-s) if exists, delete, put and train.
### Controller
get_BN(owner): return a list of bn(-s) (**with data about them**);<br>
get_BN_names(owner): return a list of bn(-s) names user owns;<br>
get_sample(owner, name, node): return an sample data array with size=SAMPLE_SIZE;<br>
remove(owner, name): remove bn (and its sample) from database.<br>
Scheme of the project structure
![image](https://user-images.githubusercontent.com/68499591/187077726-b1874472-59d7-4287-9085-3b221f266f6b.png) |