Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • F FEDOT
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 87
    • Issues 87
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 1
    • Merge requests 1
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • ITMO-NSS-team
  • FEDOT
  • Merge requests
  • !484

Merged
Created Nov 03, 2021 by Elizaveta Lutsenko@LizLutsenkoOwner

Modifying table preprocessing

  • Overview 23
  • Commits 42
  • Changes 45

Created by: Dreamlone

The preprocessing has been significantly refactored. Previously, all preprocessing functions were in pipeline.py. Now they have moved to a preprocessing module. The most important class there is DataPreprocessor. It handles "obligatory preprocessing" and "optional". The "obligatory preprocessing" includes such things as transformation of cells from "x ", " x " into "x", "x", conversion of one-dimensional targets into columns when solving classification and regression problems, exclusion of features with over 90% blanks, etc.

There is also "optional preprocessing". It consists of "imputation" and "categorical feature encoding" operations if there are no such operations in the structure of the pipeline for which preprocessing is done. If there are such operations, then this very "optional preprocessing" is not performed. How do we know that there are suitable operations in the pipeline? - For this purpose, the PipelineStructureExplorer class is implemented in preprocessing. It checks the structure of the pipeline and if it detects that if we don't fill the gaps at least somehow, the pipeline will crash, it gives a signal to DataPreprocessor to go ahead and fill the gaps before feeding data to the pipeline. So composer can always find better way to encode or fill in gaps (e.g. LabelEncoding or any other way, not OneHotEncoding). But at the same time, the pipeline won't crash even if there are no processing operations in its structure.

Preprocessing at different levels (API and pipelining) has changed. Now preprocessing is always done at API level. As soon as it is done, the preprocessed InputData block is marked as "preprocessed" via SupplementaryData flag was_preprocessed. Then in Pipeline fit and predict methods, if data block was not preprocessed, then it starts obligatory already at Pipeline level. After that comes optional.

Did a little refactoring of API. Removed unnecessary (imho) ApiFacade, which simply duplicated the functionality of Fedot class. Changed names of classes for more clear (again, imho). Also put some repeating variables into state variables. Also got rid of multiple inheritance.

Assignee
Assign to
Reviewer
Request review from
Time tracking
Source branch: penn-check