Merge branch 'master' into DRMPN-better-caching

6ca5bc43 · Илья Соколов · 55dc4827 · e15c0bfe · 6ca5bc43 · 6ca5bc43
Commit 6ca5bc43 authored 3 weeks ago by Илья Соколов
Hide whitespace changes
Inline Side-by-side

Showing

with 323 additions and 228 deletions
+323 -228
--- a/README.rst
+++ b/README.rst
@@ -211,6 +211,14 @@ Jupyter ноутбуки с примерами находятся в репоз

 Мы благодарны контрибьютерам за их важный вклад, а участникам многочисленных конференций и семинаров - за их ценные советы и предложения.

+
+Финансирование
+==============
+
+Реализовано при финансовой поддержке Фонда поддержки проектов
+Национальной технологической инициативы в рамках реализации "дорожной карты"
+развития высокотехнологичного направления "Искусственный интеллект" на период до 2030 года (Договор № 70-2021-00187)
+
 Дополнительные проекты
 ======================
 - Оптимизационное ядро, вынесенное в библиотеку GOLEM.

--- a/README_en.rst
+++ b/README_en.rst
@@ -210,6 +210,14 @@ Acknowledgments

 We acknowledge the contributors for their important impact and the participants of numerous scientific conferences and workshops for their valuable advice and suggestions.

+Funding
+=======
+
+This research is financially supported by the Foundation for
+National Technology Initiative's Projects Support as a part of the roadmap
+implementation for the development of the high-tech field of
+Artificial Intelligence for the period up to 2030 (agreement 70-2021-00187)
+
 Side Projects
 =============
 - The optimisation core implemented in the GOLEM.

--- a/docker/README.rst
+++ b/docker/README.rst
@@ -16,7 +16,7 @@ FEDOT и Docker
 Jupiter
 =======

- **Проверте наличе docker (docker-compose)** docker (docker-compose) должен быть установлен
+- **Проверьте наличие docker (docker-compose)** docker (docker-compose) должен быть установлен
 - `git clone https://github.com/aimclub/FEDOT.git` получаем файлы из git
 - `cd FEDOT` переходим в папку проекта
 - `cd docker/jupiter` переходим в папку с Docker файлами для jupiter notebook

--- a/docs/source/basics/comp_table.png
+++ b/docs/source/basics/comp_table.png
--- a/docs/source/basics/main_concepts.rst
+++ b/docs/source/basics/main_concepts.rst
@@ -10,3 +10,10 @@ The main framework concepts are as follows:
 - **Versatility.** FEDOT is :doc:`not limited to specific modeling tasks </advanced/architecture>`, for example, it can be used in ODE or PDE;
 - **Reproducibility.** Resulting pipelines can be :doc:`exported separately as JSON </advanced/pipeline_import_export>` or :doc:`together with your input data as ZIP archive </advanced/project_import_export>` for experiments reproducibility;
 - **Customizability.** FEDOT allows `managing models complexity <https://fedot.readthedocs.io/en/master/introduction/fedot_features/automation_features.html#models-used>`_ and thereby achieving desired quality.
+
+The comparison of fedot with main existing AutoML tools is provided below:
+
+|automl_features|
+
+.. |automl_features| image:: ./comp_table.png
+   :width: 80%
\ No newline at end of file
--- a/docs/source/benchmarks/img_benchmarks/fedot_amlb.png
+++ b/docs/source/benchmarks/img_benchmarks/fedot_amlb.png
--- a/docs/source/benchmarks/img_benchmarks/metrics.png
+++ b/docs/source/benchmarks/img_benchmarks/metrics.png
--- a/docs/source/benchmarks/img_benchmarks/ranks.png
+++ b/docs/source/benchmarks/img_benchmarks/ranks.png
--- a/docs/source/benchmarks/tabular.rst
+++ b/docs/source/benchmarks/tabular.rst
 Tabular data
 ------------

-Here are overall classification problem results across state-of-the-art AutoML frameworks
-using self-runned tasks form OpenML test suite (10 folds run) using F1:
-
-
-.. csv-table::
-   :header: Dataset,FEDOT,AutoGluon,H2O,TPOT
-
-    adult,0.874,0.874,0.875,0.874
-    airlines,0.669,0.669,0.675,0.617
-    airlinescodrnaadult,0.812,-,0.818,0.809
-    albert,0.670,0.669,0.697,0.667
-    amazon_employee_access,0.949,0.947,0.951,0.953
-    apsfailure,0.994,0.994,0.995,0.995
-    australian,0.871,0.870,0.865,0.860
-    bank-marketing,0.910,0.910,0.910,0.899
-    blood-transfusion,0.747,0.697,0.797,0.746
-    car,1.000,1.000,0.998,0.998
-    christine,0.746,0.746,0.748,0.737
-    click_prediction_small,0.835,0.835,0.777,0.777
-    cnae-9,0.957,0.954,0.957,0.954
-    connect-4,0.792,0.788,0.865,0.867
-    covertype,0.964,0.966,0.976,0.952
-    credit-g,0.753,0.759,0.766,0.727
-    dilbert,0.985,0.982,0.996,0.984
-    fabert,0.688,0.685,0.726,0.534
-    fashion-mnist,0.885,-,0.734,0.718
-    guillermo,0.821,-,0.915,0.897
-    helena,0.332,0.333,-,0.318
-    higgs,0.731,0.732,0.369,0.336
-    jannis,0.718,0.718,0.743,0.719
-    jasmine,0.817,0.821,0.734,0.727
-    jungle_chess_2pcs_raw_endgame_complete,0.953,0.939,0.817,0.817
-    kc1,0.866,0.867,0.996,0.947
-    kddcup09_appetency,0.982,0.982,0.866,0.818
-    kr-vs-kp,0.995,0.996,0.982,0.962
-    mfeat-factors,0.980,0.979,0.980,0.980
-    miniboone,0.948,0.948,0.952,0.949
-    nomao,0.969,0.970,0.975,0.974
-    numerai28_6,0.523,0.522,0.522,0.505
-    phoneme,0.915,0.916,0.916,0.910
-    riccardo,0.997,-,0.998,0.997
-    robert,0.405,-,0.559,0.487
-    segment,0.982,0.982,0.982,0.980
-    shuttle,1.000,1.000,1.000,1.000
-    sylvine,0.952,0.951,0.952,0.948
-    vehicle,0.851,0.849,0.846,0.835
-    volkert,0.694,0.694,0.758,0.697
-    Mean F1,0.838,0.837,0.833,0.812
-
-
-Also, we tested FEDOT on the results of `AMLB <https://github.com/openml/automlbenchmark>`_ benchmark.
-The visualization of FEDOT (v.0.7.3) results against H2O (3.46.0.4), AutoGluon (v.1.1.0), TPOT (v.0.12.1) and LightAutoML (v.0.3.7.3)
-obtained using built-in visualizations of critial difference plot from AutoMLBenchmark are provided below:
-
-All datasets (ROC AUC and negative log loss):
+We tested FEDOT on the results of `AMLB <https://github.com/openml/automlbenchmark>`_ benchmark.
+We used the setup of the framework obtained from 'frameworks.yaml' on the date of starts of experiments.
+So, the following stable versions were used: AutoGluon 0.7.0, TPOT 0.11.7, LightAutoML 0.3.7.3, v3.40.0.2, FEDOT 0.7.2.
+Some runs for AutoGluon are failed due to the errors (described also in Appendix D of AMLB paper [1]).
+
+The visualization obtained using built-in visualizations of critical difference plot (CD) from AutoMLBenchmark [1].
+
+In a CD (Critical Difference) diagram,
+we display each framework's average rank and highlight which ranks are
+statistically significantly different from one another.
+
+To determine the average rank per task,
+we first replace any missing values with a constant predictor,
+calculate ranks for represented AutoML solutions and constant predictor
+for each dataset and than took an average value of ranks across all datasets for each represented solution.
+
+We assess statistical significance of the rank differences using a non-parametric Friedman test with a
+threshold of p < 0.05 (resulting in p ≈ 0 for all diagrams)
+and apply a Nemenyi post-hoc test to identify which framework pairs differ significantly.
+
+Time budget for all experiments is 1 hour, 10 folds are used (1h8c setup for ALMB). The results are
+obtained using sever based on Xeon Cascadelake (2900MHz) with 12 cores and 16GB memory.
+
+CD for all datasets (ROC AUC and negative log loss):

 .. image:: ./img_benchmarks/cd-all-1h8c-constantpredictor.png

-Binary classification (ROC AUC):
+The CD diagram for all datasets (ROC AUC and negative log loss) shows that all AutoML frameworks
+(LightAutoML, H2OAutoML, TPOT,  AutoGluon, FEDOT) perform statistically better than constant predictor:
+
+CD for binary classification (ROC AUC):

 .. image:: ./img_benchmarks/cd-binary-classification-1h8c-constantpredictor.png

-Multiclass classification (negative logloss):
+The CD diagram for binary classification (ROC AUC) shows that all AutoML frameworks
+(LightAutoML, H2OAutoML, TPOT,  AutoGluon, FEDOT) perform similarly,
+falling within the same CD interval, and significantly outperform  the constant predictor:
+
+CD for multiclass classification (negative logloss):

 .. image:: ./img_benchmarks/cd-multiclass-classification-1h8c-constantpredictor.png

-We can claim that results are statistically better that TPOT and and indistinguishable from H2O and AutoGluon.
+The CD diagram for multiclass classification (negative log loss) shows that
+TPOT and Fedot demonstrate intermediate performance being on the border of the
+CD interval with constant predictor and the CD interval with H2OAutoML:
+
+We can conclude that FEDOT achieves performance comparable with competitors for tabular tasks.
+
+The ranks for frameworks are provided below:
+
+.. image:: ./img_benchmarks/ranks.png
+
+The raw metrics (ROC AUC for binary and logloss for multiclass) for frameworks are provided below:
+
+.. image:: ./img_benchmarks/metrics.png
+
+The comparison with [1] shows that AutoGluon is underperforming in our hardware setup,
+while TPOT and H2O are quite close in both setups.
+To avoid any confusion, we provide below an additional comparison of the FEDOT metrics with the metrics from [1].
+However, it should be noted that the conditions are different, as are the exact versions of the frameworks.
+
+.. image:: ./img_benchmarks/fedot_amlb.png
+
+[1] Gijsbers P. et al. AMLB: an AutoML benchmark //Journal of Machine Learning Research. – 2024. – Т. 25. – №. 101. – С. 1-65.

--- a/examples/advanced/multi_modal_pipeline.py
+++ b/examples/advanced/multi_modal_pipeline.py
@@ -41,6 +41,10 @@ def prepare_multi_modal_data(files_path: str, task: Task, images_size: tuple = (
    """

    path = os.path.join(str(fedot_project_root()), files_path)
+
+    if not os.path.exists(path):
+        raise FileNotFoundError(path)
+
    # unpacking of data archive
    unpack_archived_data(path)
    # import of table data
@@ -68,7 +72,7 @@ def prepare_multi_modal_data(files_path: str, task: Task, images_size: tuple = (
    return data


-def run_multi_modal_pipeline(files_path: str, visualization=False) -> float:
+def run_multi_modal_pipeline(files_path: str, timeout=15, visualization=False) -> float:
    task = Task(TaskTypesEnum.classification)
    images_size = (224, 224)

@@ -76,7 +80,7 @@ def run_multi_modal_pipeline(files_path: str, visualization=False) -> float:

    fit_data, predict_data = train_test_data_setup(data, shuffle=True, split_ratio=0.6)

-    automl_model = Fedot(problem='classification', timeout=15)
+    automl_model = Fedot(problem='classification', timeout=timeout)
    pipeline = automl_model.fit(features=fit_data,
                                target=fit_data.target)


--- a/fedot/api/api_utils/api_params_repository.py
+++ b/fedot/api/api_utils/api_params_repository.py
-import datetime
-from typing import Sequence
-
-from golem.core.optimisers.genetic.operators.inheritance import GeneticSchemeTypesEnum
-from golem.core.optimisers.genetic.operators.mutation import MutationTypesEnum
-
-from fedot.core.composer.gp_composer.specific_operators import parameter_change_mutation, add_resample_mutation
-from fedot.core.constants import AUTO_PRESET_NAME
-from fedot.core.repository.tasks import TaskTypesEnum
-from fedot.core.utils import default_fedot_data_dir
-
-
-class ApiParamsRepository:
-    """Repository storing possible Api parameters and their default values. Also returns parameters required
-    for data classes (``PipelineComposerRequirements``, ``GPAlgorithmParameters``, ``GraphGenerationParams``)
-    used while model composition.
-     """
-
-    COMPOSER_REQUIREMENTS_KEYS = {'max_arity', 'max_depth', 'num_of_generations',
-                                  'early_stopping_iterations', 'early_stopping_timeout',
-                                  'parallelization_mode', 'use_input_preprocessing',
-                                  'show_progress', 'collect_intermediate_metric', 'keep_n_best',
-                                  'keep_history', 'history_dir', 'cv_folds'}
-
-    STATIC_INDIVIDUAL_METADATA_KEYS = {'use_input_preprocessing'}
-
-    def __init__(self, task_type: TaskTypesEnum):
-        self.task_type = task_type
-        self.default_params = ApiParamsRepository.default_params_for_task(self.task_type)
-
-    @staticmethod
-    def default_params_for_task(task_type: TaskTypesEnum) -> dict:
-        """ Returns a dict with default parameters"""
-        if task_type in [TaskTypesEnum.classification, TaskTypesEnum.regression]:
-            cv_folds = 5
-
-        elif task_type == TaskTypesEnum.ts_forecasting:
-            cv_folds = 3
-
-        # Dict with allowed keyword attributes for Api and their default values. If None - default value set
-        # in dataclasses ``PipelineComposerRequirements``, ``GPAlgorithmParameters``, ``GraphGenerationParams``
-        # will be used.
-        default_param_values_dict = dict(
-            parallelization_mode='populational',
-            show_progress=True,
-            max_depth=6,
-            max_arity=3,
-            pop_size=20,
-            num_of_generations=None,
-            keep_n_best=1,
-            available_operations=None,
-            metric=None,
-            cv_folds=cv_folds,
-            genetic_scheme=None,
-            early_stopping_iterations=None,
-            early_stopping_timeout=10,
-            optimizer=None,
-            collect_intermediate_metric=False,
-            max_pipeline_fit_time=None,
-            initial_assumption=None,
-            preset=AUTO_PRESET_NAME,
-            use_operations_cache=True,
-            use_preprocessing_cache=True,
-            use_predictions_cache=False,
-            use_input_preprocessing=True,
-            use_auto_preprocessing=False,
-            use_meta_rules=False,
-            cache_dir=default_fedot_data_dir(),
-            keep_history=True,
-            history_dir=default_fedot_data_dir(),
-            with_tuning=True
-        )
-        return default_param_values_dict
-
-    def check_and_set_default_params(self, params: dict) -> dict:
-        """ Sets default values for parameters which were not set by the user
-        and raises KeyError for invalid parameter keys"""
-        allowed_keys = self.default_params.keys()
-        invalid_keys = params.keys() - allowed_keys
-        if invalid_keys:
-            raise KeyError(f"Invalid key parameters {invalid_keys}")
-        else:
-            missing_params = self.default_params.keys() - params.keys()
-            for k in missing_params:
-                if (v := self.default_params[k]) is not None:
-                    params[k] = v
-        return params
-
-    @staticmethod
-    def get_params_for_composer_requirements(params: dict) -> dict:
-        """ Returns dict with parameters suitable for ``PipelineComposerParameters``"""
-        composer_requirements_params = {k: v for k, v in params.items()
-                                        if k in ApiParamsRepository.COMPOSER_REQUIREMENTS_KEYS}
-
-        max_pipeline_fit_time = params.get('max_pipeline_fit_time')
-        if max_pipeline_fit_time:
-            composer_requirements_params['max_graph_fit_time'] = datetime.timedelta(minutes=max_pipeline_fit_time)
-
-        composer_requirements_params = ApiParamsRepository.set_static_individual_metadata(composer_requirements_params)
-
-        return composer_requirements_params
-
-    @staticmethod
-    def set_static_individual_metadata(composer_requirements_params: dict) -> dict:
-        """ Returns dict with representing ``static_individual_metadata`` for ``PipelineComposerParameters``"""
-        static_individual_metadata = {k: v for k, v in composer_requirements_params.items()
-                                      if k in ApiParamsRepository.STATIC_INDIVIDUAL_METADATA_KEYS}
-        for k in ApiParamsRepository.STATIC_INDIVIDUAL_METADATA_KEYS:
-            composer_requirements_params.pop(k)
-
-        composer_requirements_params['static_individual_metadata'] = static_individual_metadata
-        return composer_requirements_params
-
-    def get_params_for_gp_algorithm_params(self, params: dict) -> dict:
-        """ Returns dict with parameters suitable for ``GPAlgorithmParameters``"""
-        gp_algorithm_params = {'pop_size': params.get('pop_size'),
-                               'genetic_scheme_type': GeneticSchemeTypesEnum.parameter_free}
-        if params.get('genetic_scheme') == 'steady_state':
-            gp_algorithm_params['genetic_scheme_type'] = GeneticSchemeTypesEnum.steady_state
-
-        gp_algorithm_params['mutation_types'] = ApiParamsRepository._get_default_mutations(self.task_type, params)
-        return gp_algorithm_params
-
-    @staticmethod
-    def _get_default_mutations(task_type: TaskTypesEnum, params) -> Sequence[MutationTypesEnum]:
-        mutations = [parameter_change_mutation,
-                     MutationTypesEnum.single_change,
-                     MutationTypesEnum.single_drop,
-                     MutationTypesEnum.single_add,
-                     MutationTypesEnum.single_edge]
-
-        # TODO remove workaround after boosting mutation fix
-        #      Boosting mutation does not work due to problem with __eq__ with it copy.
-        #      ``partial`` refactor to ``def`` does not work
-        #      Also boosting mutation does not work by it own.
-        if task_type == TaskTypesEnum.ts_forecasting:
-            # mutations.append(partial(boosting_mutation, params=params))
-            pass
-        else:
-            mutations.append(add_resample_mutation)
-
-        return mutations
+import datetime
+from typing import Sequence
+
+from golem.core.optimisers.genetic.operators.inheritance import GeneticSchemeTypesEnum
+from golem.core.optimisers.genetic.operators.mutation import MutationTypesEnum
+
+from fedot.core.composer.gp_composer.specific_operators import parameter_change_mutation, add_resample_mutation
+from fedot.core.constants import AUTO_PRESET_NAME
+from fedot.core.repository.tasks import TaskTypesEnum
+from fedot.core.utils import default_fedot_data_dir
+
+
+class ApiParamsRepository:
+    """Repository storing possible Api parameters and their default values. Also returns parameters required
+    for data classes (``PipelineComposerRequirements``, ``GPAlgorithmParameters``, ``GraphGenerationParams``)
+    used while model composition.
+     """
+
+    COMPOSER_REQUIREMENTS_KEYS = {'max_arity', 'max_depth', 'num_of_generations',
+                                  'early_stopping_iterations', 'early_stopping_timeout',
+                                  'parallelization_mode', 'use_input_preprocessing',
+                                  'show_progress', 'collect_intermediate_metric', 'keep_n_best',
+                                  'keep_history', 'history_dir', 'cv_folds'}
+
+    STATIC_INDIVIDUAL_METADATA_KEYS = {'use_input_preprocessing'}
+
+    def __init__(self, task_type: TaskTypesEnum):
+        self.task_type = task_type
+        self.default_params = ApiParamsRepository.default_params_for_task(self.task_type)
+
+    @staticmethod
+    def default_params_for_task(task_type: TaskTypesEnum) -> dict:
+        """ Returns a dict with default parameters"""
+        if task_type in [TaskTypesEnum.classification, TaskTypesEnum.regression]:
+            cv_folds = 5
+
+        elif task_type == TaskTypesEnum.ts_forecasting:
+            cv_folds = 3
+
+        # Dict with allowed keyword attributes for Api and their default values. If None - default value set
+        # in dataclasses ``PipelineComposerRequirements``, ``GPAlgorithmParameters``, ``GraphGenerationParams``
+        # will be used.
+        default_param_values_dict = dict(
+            parallelization_mode='populational',
+            show_progress=True,
+            max_depth=6,
+            max_arity=3,
+            pop_size=20,
+            num_of_generations=None,
+            keep_n_best=1,
+            available_operations=None,
+            metric=None,
+            cv_folds=cv_folds,
+            genetic_scheme=None,
+            early_stopping_iterations=None,
+            early_stopping_timeout=10,
+            optimizer=None,
+            collect_intermediate_metric=False,
+            max_pipeline_fit_time=None,
+            initial_assumption=None,
+            preset=AUTO_PRESET_NAME,
+            use_pipelines_cache=True,
+            use_preprocessing_cache=True,
+            use_predictions_cache=False,
+            use_input_preprocessing=True,
+            use_auto_preprocessing=False,
+            use_meta_rules=False,
+            cache_dir=default_fedot_data_dir(),
+            keep_history=True,
+            history_dir=default_fedot_data_dir(),
+            with_tuning=True,
+            seed=None
+        )
+        return default_param_values_dict
+
+    def check_and_set_default_params(self, params: dict) -> dict:
+        """ Sets default values for parameters which were not set by the user
+        and raises KeyError for invalid parameter keys"""
+        allowed_keys = self.default_params.keys()
+        invalid_keys = params.keys() - allowed_keys
+        if invalid_keys:
+            raise KeyError(f"Invalid key parameters {invalid_keys}")
+        else:
+            missing_params = self.default_params.keys() - params.keys()
+            for k in missing_params:
+                if (v := self.default_params[k]) is not None:
+                    params[k] = v
+        return params
+
+    @staticmethod
+    def get_params_for_composer_requirements(params: dict) -> dict:
+        """ Returns dict with parameters suitable for ``PipelineComposerParameters``"""
+        composer_requirements_params = {k: v for k, v in params.items()
+                                        if k in ApiParamsRepository.COMPOSER_REQUIREMENTS_KEYS}
+
+        max_pipeline_fit_time = params.get('max_pipeline_fit_time')
+        if max_pipeline_fit_time:
+            composer_requirements_params['max_graph_fit_time'] = datetime.timedelta(minutes=max_pipeline_fit_time)
+
+        composer_requirements_params = ApiParamsRepository.set_static_individual_metadata(composer_requirements_params)
+
+        return composer_requirements_params
+
+    @staticmethod
+    def set_static_individual_metadata(composer_requirements_params: dict) -> dict:
+        """ Returns dict with representing ``static_individual_metadata`` for ``PipelineComposerParameters``"""
+        static_individual_metadata = {k: v for k, v in composer_requirements_params.items()
+                                      if k in ApiParamsRepository.STATIC_INDIVIDUAL_METADATA_KEYS}
+        for k in ApiParamsRepository.STATIC_INDIVIDUAL_METADATA_KEYS:
+            composer_requirements_params.pop(k)
+
+        composer_requirements_params['static_individual_metadata'] = static_individual_metadata
+        return composer_requirements_params
+
+    def get_params_for_gp_algorithm_params(self, params: dict) -> dict:
+        """ Returns dict with parameters suitable for ``GPAlgorithmParameters``"""
+        gp_algorithm_params = {'pop_size': params.get('pop_size'),
+                               'genetic_scheme_type': GeneticSchemeTypesEnum.parameter_free}
+        if params.get('genetic_scheme') == 'steady_state':
+            gp_algorithm_params['genetic_scheme_type'] = GeneticSchemeTypesEnum.steady_state
+
+        gp_algorithm_params['mutation_types'] = ApiParamsRepository._get_default_mutations(self.task_type, params)
+        gp_algorithm_params['seed'] = params['seed']
+        return gp_algorithm_params
+
+    @staticmethod
+    def _get_default_mutations(task_type: TaskTypesEnum, params) -> Sequence[MutationTypesEnum]:
+        mutations = [parameter_change_mutation,
+                     MutationTypesEnum.single_change,
+                     MutationTypesEnum.single_drop,
+                     MutationTypesEnum.single_add,
+                     MutationTypesEnum.single_edge]
+
+        # TODO remove workaround after boosting mutation fix
+        #      Boosting mutation does not work due to problem with __eq__ with it copy.
+        #      ``partial`` refactor to ``def`` does not work
+        #      Also boosting mutation does not work by it own.
+        if task_type == TaskTypesEnum.ts_forecasting:
+            # mutations.append(partial(boosting_mutation, params=params))
+            pass
+        else:
+            mutations.append(add_resample_mutation)
+
+        return mutations
--- a/fedot/api/api_utils/assumptions/assumptions_builder.py
+++ b/fedot/api/api_utils/assumptions/assumptions_builder.py
@@ -127,6 +127,12 @@ class MultiModalAssumptionsBuilder(AssumptionsBuilder):
            data_pipeline_alternatives = subbuilder.build(first_node, use_input_preprocessing=use_input_preprocessing)
            subpipelines.append(data_pipeline_alternatives)

+        # TODO: fix this workaround during the improvement of multi-modality
+        for i, subpipeline in enumerate(subpipelines):
+            if (len(subpipeline) == 1 and len(subpipeline[0].nodes) == 1 and
+                    str(subpipeline[0].nodes[0]) in ['cnn', 'data_source_img']):
+                subpipelines[i] = [Pipeline(PipelineNode('cnn', nodes_from=[PipelineNode('data_source_img')]))]
+
        # Then zip these alternatives together and add final node to get ensembles.
        ensemble_builders: List[PipelineBuilder] = []
        for pre_ensemble in zip(*subpipelines):

--- a/fedot/api/api_utils/assumptions/task_assumptions.py
+++ b/fedot/api/api_utils/assumptions/task_assumptions.py
@@ -93,6 +93,7 @@ class RegressionAssumptions(TaskAssumptions):
        return {
            'rfr': PipelineBuilder().add_node('rfr'),
            'ridge': PipelineBuilder().add_node('ridge'),
+            'lgbmreg': PipelineBuilder().add_node('lgbmreg'),
        }

    def ensemble_operation(self) -> str:
@@ -112,9 +113,13 @@ class ClassificationAssumptions(TaskAssumptions):
    @property
    def builders(self):
        return {
+            'gbm_linear': PipelineBuilder().
+            add_branch('catboost', 'xgboost', 'lgbm').join_branches('logit'),
+            'catboost': PipelineBuilder().add_node('catboost'),
+            'xgboost': PipelineBuilder().add_node('xgboost'),
+            'lgbm': PipelineBuilder().add_node('lgbm'),
            'rf': PipelineBuilder().add_node('rf'),
            'logit': PipelineBuilder().add_node('logit'),
-            'catboost': PipelineBuilder().add_node('catboost'),
        }

    def ensemble_operation(self) -> str:

--- a/fedot/api/api_utils/params.py
+++ b/fedot/api/api_utils/params.py
@@ -26,7 +26,7 @@ from fedot.core.repository.tasks import Task, TaskTypesEnum, TaskParams, TsForec
 class ApiParams(UserDict):

    def __init__(self, input_params: Dict[str, Any], problem: str, task_params: Optional[TaskParams] = None,
-                 n_jobs: int = -1, timeout: float = 5):
+                 n_jobs: int = -1, timeout: float = 5, seed=None):
        self.log: LoggerAdapter = default_log(self)
        self.task: Task = self._get_task_with_params(problem, task_params)
        self.n_jobs: int = determine_n_jobs(n_jobs)
@@ -34,6 +34,7 @@ class ApiParams(UserDict):

        self._params_repository = ApiParamsRepository(self.task.task_type)
        parameters: dict = self._params_repository.check_and_set_default_params(input_params)
+        parameters['seed'] = seed
        super().__init__(parameters)
        self._check_timeout_vs_generations()

@@ -139,9 +140,14 @@ class ApiParams(UserDict):
        """Method to initialize ``GPAlgorithmParameters``"""
        gp_algorithm_parameters = self._params_repository.get_params_for_gp_algorithm_params(self.data)

+        # workaround for "{TypeError}__init__() got an unexpected keyword argument 'seed'"
+        seed = gp_algorithm_parameters['seed']
+        del gp_algorithm_parameters['seed']
+
        self.optimizer_params = GPAlgorithmParameters(
            multi_objective=multi_objective, **gp_algorithm_parameters
        )
+        self.optimizer_params.seed = seed
        return self.optimizer_params

    def init_graph_generation_params(self, requirements: PipelineComposerRequirements) -> GraphGenerationParams:

--- a/fedot/api/main.py
+++ b/fedot/api/main.py
@@ -33,9 +33,9 @@ from fedot.explainability.explainer_template import Explainer
 from fedot.explainability.explainers import explain_pipeline
 from fedot.preprocessing.base_preprocessing import BasePreprocessor
 from fedot.remote.remote_evaluator import RemoteEvaluator
+from fedot.utilities.composer_timer import fedot_composer_timer
 from fedot.utilities.define_metric_by_task import MetricByTask
 from fedot.utilities.memory import MemoryAnalytics
-from fedot.utilities.composer_timer import fedot_composer_timer
 from fedot.utilities.project_import_export import export_project_to_zip, import_project_from_zip

 NOT_FITTED_ERR_MSG = 'Model not fitted yet'
@@ -95,7 +95,7 @@ class Fedot:
        self.log = self._init_logger(logging_level)

        # Attributes for dealing with metrics, data sources and hyperparameters
-        self.params = ApiParams(composer_tuner_params, problem, task_params, n_jobs, timeout)
+        self.params = ApiParams(composer_tuner_params, problem, task_params, n_jobs, timeout, seed)

        default_metrics = MetricByTask.get_default_quality_metrics(self.params.task.task_type)
        passed_metrics = self.params.get('metric')
@@ -256,7 +256,7 @@ class Fedot:
                              .with_timeout(timeout)
                              .build(input_data))

-            self.current_pipeline = pipeline_tuner.tune(self.current_pipeline, show_progress)
+            self.current_pipeline = pipeline_tuner.tune(self.current_pipeline, show_progress=show_progress)
            self.api_composer.was_tuned = pipeline_tuner.was_tuned

            # Tuner returns a not fitted pipeline, and it is required to fit on train dataset

--- a/fedot/core/caching/operations_cache_db.py
+++ b/fedot/core/caching/operations_cache_db.py
 import pickle
 import sqlite3
+import zlib
+from sys import getsizeof
 from contextlib import closing
 from os import getpid
 from typing import List, Optional, Tuple, TypeVar

+from golem.core.log import default_log
+
 from fedot.core.caching.base_cache_db import BaseCacheDB
 from fedot.core.operations.operation import Operation

 IOperation = TypeVar('IOperation', bound=Operation)
+MAX_BLOB_SIZE = 2**31 - 1


 class OperationsCacheDB(BaseCacheDB):
@@ -78,10 +83,17 @@ class OperationsCacheDB(BaseCacheDB):
        with closing(sqlite3.connect(self.db_path)) as conn:
            with conn:
                cur = conn.cursor()
-                pickled = [
-                    (uid, sqlite3.Binary(pickle.dumps(val, pickle.HIGHEST_PROTOCOL)))
-                    for uid, val in uid_val_lst
-                ]
+                pickled = []
+                for uid, val in uid_val_lst:
+                    serialized = pickle.dumps(val, pickle.HIGHEST_PROTOCOL)
+                    serialized_size = getsizeof(serialized)
+                    if serialized_size > MAX_BLOB_SIZE:
+                        serialized = zlib.compress(serialized)
+                        default_log('Cache').warning(
+                            f'Pipeline serialization was compressed due to size limit exceeded. '
+                            f'Size: {serialized_size:.2f} bytes (limit: {MAX_BLOB_SIZE} bytes)'
+                        )
+                    pickled.append((uid, sqlite3.Binary(serialized)))
                cur.executemany(f'INSERT OR IGNORE INTO {self._main_table} VALUES (?, ?);', pickled)

    def _init_db(self):

--- a/fedot/core/data/data.py
+++ b/fedot/core/data/data.py
@@ -57,11 +57,11 @@ class Data:
    def from_numpy(cls,
                   features_array: np.ndarray,
                   target_array: np.ndarray,
-                   features_names: np.ndarray[str] = None,
-                   categorical_idx: Union[list[int, str], np.ndarray[int, str]] = None,
                   idx: Optional[np.ndarray] = None,
                   task: Union[Task, str] = 'classification',
-                   data_type: Optional[DataTypesEnum] = DataTypesEnum.table) -> InputData:
+                   data_type: Optional[DataTypesEnum] = DataTypesEnum.table,
+                   features_names: np.ndarray[str] = None,
+                   categorical_idx: Union[list[int, str], np.ndarray[int, str]] = None) -> InputData:
        """Import data from numpy array.

        Args:
@@ -79,7 +79,13 @@ class Data:
        """
        if isinstance(task, str):
            task = Task(TaskTypesEnum(task))
-        return array_to_input_data(features_array, target_array, features_names, categorical_idx, idx, task, data_type)
+        return array_to_input_data(features_array=features_array,
+                                   target_array=target_array,
+                                   features_names=features_names,
+                                   categorical_idx=categorical_idx,
+                                   idx=idx,
+                                   task=task,
+                                   data_type=data_type)

    @classmethod
    def from_numpy_time_series(cls,
@@ -104,7 +110,11 @@ class Data:
            task = Task(TaskTypesEnum(task))
        if target_array is None:
            target_array = features_array
-        return array_to_input_data(features_array, target_array, idx, task, data_type)
+        return array_to_input_data(features_array=features_array,
+                                   target_array=target_array,
+                                   idx=idx,
+                                   task=task,
+                                   data_type=data_type)

    @classmethod
    def from_dataframe(cls,
@@ -848,11 +858,11 @@ def np_datetime_to_numeric(data: np.ndarray) -> np.ndarray:

 def array_to_input_data(features_array: np.ndarray,
                        target_array: np.ndarray,
-                        features_names: np.ndarray[str] = None,
-                        categorical_idx: Union[list[int, str], np.ndarray[int, str]] = None,
                        idx: Optional[np.ndarray] = None,
                        task: Task = Task(TaskTypesEnum.classification),
-                        data_type: Optional[DataTypesEnum] = None) -> InputData:
+                        data_type: Optional[DataTypesEnum] = None,
+                        features_names: np.ndarray[str] = None,
+                        categorical_idx: Union[list[int, str], np.ndarray[int, str]] = None) -> InputData:
    if idx is None:
        idx = np.arange(len(features_array))
    if data_type is None:

--- a/fedot/core/operations/evaluation/boostings.py
+++ b/fedot/core/operations/evaluation/boostings.py
 from typing import Optional

+import numpy as np
+
 from fedot.core.data.data import InputData, OutputData
 from fedot.core.operations.evaluation.evaluation_interfaces import EvaluationStrategy
 from fedot.core.operations.evaluation.operation_implementations.models.boostings_implementations import \
@@ -7,6 +9,7 @@ from fedot.core.operations.evaluation.operation_implementations.models.boostings
    FedotXGBoostClassificationImplementation, FedotXGBoostRegressionImplementation, \
    FedotLightGBMClassificationImplementation, FedotLightGBMRegressionImplementation
 from fedot.core.operations.operation_parameters import OperationParameters
+from fedot.core.operations.evaluation.evaluation_interfaces import is_multi_output_task
 from fedot.core.repository.tasks import TaskTypesEnum
 from fedot.utilities.random import ImplementationRandomStateHandler

@@ -33,6 +36,15 @@ class BoostingStrategy(EvaluationStrategy):
            raise ValueError(f'Impossible to obtain Boosting Strategy for {operation_type}')

    def fit(self, train_data: InputData):
+        if train_data.task.task_type == TaskTypesEnum.ts_forecasting:
+            raise ValueError('Time series forecasting not supported for boosting models')
+
+        if is_multi_output_task(train_data):
+            if self.operation_type == 'catboost':
+                self.params_for_fit.update(loss_function='MultiLogloss')
+            elif self.operation_type == 'catboostreg':
+                self.params_for_fit.update(loss_function='MultiRMSE')
+
        operation_implementation = self.operation_impl(self.params_for_fit)

        with ImplementationRandomStateHandler(implementation=operation_implementation):
@@ -49,21 +61,35 @@ class BoostingClassificationStrategy(BoostingStrategy):
        super().__init__(operation_type, params)

    def predict(self, trained_operation, predict_data: InputData) -> OutputData:
-        n_classes = len(trained_operation.classes_)
        if self.output_mode in ['labels']:
            prediction = trained_operation.predict(predict_data)
        elif (self.output_mode in ['probs', 'full_probs', 'default'] and
              predict_data.task.task_type is TaskTypesEnum.classification):
+            n_classes = len(trained_operation.classes_)
+            is_multi_output_target = is_multi_output_task(predict_data)
+
            prediction = trained_operation.predict_proba(predict_data)
+            is_prediction_correct = self._check_prediction_correctness(prediction)
+
            if n_classes < 2:
                raise ValueError('Data set contain only 1 target class. Please reformat your data.')
-            elif n_classes == 2 and self.output_mode != 'full_probs' and len(prediction.shape) > 1:
-                prediction = prediction[:, 1]
+            elif n_classes == 2 and self.output_mode != 'full_probs' and is_prediction_correct:
+                if is_multi_output_target and isinstance(prediction, list):
+                    prediction = np.stack([pred[:, 1] for pred in prediction]).T
+                else:
+                    prediction = prediction[:, 1]
        else:
            raise ValueError(f'Output mode {self.output_mode} is not supported')

        return self._convert_to_output(prediction, predict_data)

+    @staticmethod
+    def _check_prediction_correctness(prediction) -> bool:
+        if isinstance(prediction, list):
+            return len(prediction[0].shape) > 1
+        else:
+            return len(prediction.shape) > 1
+

 class BoostingRegressionStrategy(BoostingStrategy):
    def __init__(self, operation_type: str, params: Optional[OperationParameters] = None):

--- a/fedot/core/operations/evaluation/evaluation_interfaces.py
+++ b/fedot/core/operations/evaluation/evaluation_interfaces.py
@@ -302,6 +302,7 @@ def convert_to_multivariate_model(sklearn_model, train_data: InputData):


 def is_multi_output_task(train_data):
-    target_shape = train_data.target.shape
-    is_multi_target = len(target_shape) > 1 and target_shape[1] > 1
-    return is_multi_target
+    if train_data.target is not None:
+        target_shape = train_data.target.shape
+        is_multi_target = len(target_shape) > 1 and target_shape[1] > 1
+        return is_multi_target
--- a/fedot/core/operations/evaluation/operation_implementations/data_operations/categorical_encoders.py
+++ b/fedot/core/operations/evaluation/operation_implementations/data_operations/categorical_encoders.py
@@ -99,11 +99,14 @@ class OneHotEncodingImplementation(DataOperationImplementation):
        if isinstance(features, np.ndarray):
            transformed_categorical = self.encoder.transform(features[:, self.categorical_ids]).toarray()
            # Stack transformed categorical and non-categorical data, ignore if none
-            non_categorical_features = features[:, self.non_categorical_ids.astype(int)]
+            non_categorical_features = np.array(features[:, self.non_categorical_ids.astype(int)])

        else:
            transformed_categorical = self.encoder.transform(features.iloc[:, self.categorical_ids]).toarray()
-            non_categorical_features = features.iloc[:, self.non_categorical_ids.astype(int)].to_numpy()
+            non_categorical_features = np.array(features.iloc[:, self.non_categorical_ids.astype(int)])
+
+        transformed_categorical = transformed_categorical.astype(np.float32)
+        non_categorical_features = non_categorical_features.astype(np.float32)

        frames = (non_categorical_features, transformed_categorical)
        transformed_features = np.hstack(frames)