Refactor existing DataMerge functionality (closes #610)
Created by: gkirgizov
Collapse most of the code and branches into one direct 'merge' method with stages. Merge process is almost same for most data types, essentially just concatenating data along last dimension. Stages can be customized for each datatype, instead of branching to separate merge.
Substitute many manual routines with numpy counterparts. Move merge-related stuff from SupplementaryData into separate SupplementaryDataMerger.
Drop InputData.from_predictions in favor of DataMerge.merge that now directly returns InputData.
Created by: pep8speaks
Hello @gkirgizov! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
- In the file
fedot/core/data/data.py
:
Line 418:121: E501 line too long (123 > 120 characters)
Comment last updated at 2022-04-05 09:17:16 UTC
- In the file
Created by: gkirgizov
@Dreamlone добавишь кого-нибудь еще в ревьюеры? Не знаю, кто еще с этой подсистемой мержа как-то связан. Или в целом, whatever the code, кого угодно с ядра можно добавлять (Колю, тебя, Илью?)
- Last updated by Elizaveta Lutsenko
- Last updated by Elizaveta Lutsenko
- Last updated by Elizaveta Lutsenko
- Last updated by Elizaveta Lutsenko
Created by: codecov[bot]
Codecov Report
Merging #621 (97e605ae) into master (a271006f) will decrease coverage by
0.32%
. The diff coverage is99.06%
.@@ Coverage Diff @@ ## master #621 +/- ## ========================================== - Coverage 86.80% 86.47% -0.33% ========================================== Files 151 153 +2 Lines 11250 11191 -59 ========================================== - Hits 9765 9677 -88 - Misses 1485 1514 +29
Impacted Files Coverage Δ fedot/api/api_utils/data_definition.py 84.00% <ø> (+0.45%)
fedot/core/data/merge/data_merger.py 97.87% <97.87%> (ø)
fedot/core/data/array_utilities.py 100.00% <100.00%> (ø)
fedot/core/data/data.py 83.47% <100.00%> (-1.24%)
fedot/core/data/merge/supplementary_data_merger.py 100.00% <100.00%> (ø)
fedot/core/data/supplementary_data.py 92.85% <100.00%> (-2.31%)
...ation_implementations/data_operations/decompose.py 93.47% <100.00%> (ø)
fedot/core/pipelines/node.py 95.52% <100.00%> (+0.02%)
fedot/core/pipelines/pipeline_builder.py 97.72% <100.00%> (+0.46%)
fedot/core/utilities/data_structures.py 100.00% <100.00%> (ø)
... and 6 more
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update a271006...97e605a. Read the comment docs.- Last updated by Elizaveta Lutsenko
1 from functools import reduce 2 from typing import Optional 3 4 import numpy as np 5 6 7 def find_common_elements(*indices: np.array) -> np.array: 8 """ Returns array with unique elements common to *all* indices 9 or the first index if it's the only one. """ 10 common_elements = reduce(np.intersect1d, indices[1:], indices[0]) Created by: Dreamlone
Просто бесполезное (наверное) наблюдение. В голову пришёл случай, когда индексы OutputData такие
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
и[11, 5, 2, 13]
, то функция должна по идее выдать[2, 5, 11, 13]
. То есть неявно произведёт "сортировку" - Это не приведёт к каким-нибудь побочным эффектам?
104 105 assert merged_ts.data_type == DataTypesEnum.ts 106 expected_shape = (len(ts1.idx), 2) 107 assert merged_ts.features.shape == expected_shape 108 109 ts1 = get_output_timeseries(num_variables=2, for_predict=False) 110 ts2 = get_output_timeseries(num_variables=3, for_predict=False) 111 merged_ts = DataMerger.get([ts1, ts2]).merge() 112 113 assert merged_ts.data_type == DataTypesEnum.ts 114 expected_shape = (len(ts1.idx), 5) 115 assert merged_ts.features.shape == expected_shape 116 117 118 def test_data_merge_ts_different_forecast_lengths(): 119 output_short = get_output_timeseries(len_forecast=5, for_predict=True) 1 from functools import reduce 2 from typing import Optional 3 4 import numpy as np 5 6 7 def find_common_elements(*indices: np.array) -> np.array: 8 """ Returns array with unique elements common to *all* indices 9 or the first index if it's the only one. """ 10 common_elements = reduce(np.intersect1d, indices[1:], indices[0]) Created by: gkirgizov
Понял, откуда вопрос. Это не общий индекс, это общие элементы индексов. Тут да, будет, возможно сортировка. Но это не имеет значения. Итоговая выборка общего индекса производится в строках
def merge(self) -> 'InputData': common_idx = self.select_common(self.main_output.idx)
457 target = np.array(target) 458 if len(target.shape) < 2: 459 target = target.reshape((-1, 1)) 460 461 return features, target 462 463 464 def process_multiple_columns(target_columns, data_frame): 465 """ Function for processing target """ 466 features = np.array(data_frame.drop(columns=target_columns)) 467 468 # Remove index column 469 targets = np.array(data_frame[target_columns]) 431 target = None 432 features_df = data_frame 433 if target_column: - Last updated by Elizaveta Lutsenko
1 from functools import reduce 2 from typing import Optional 3 4 import numpy as np 5 6 7 def find_common_elements(*indices: np.array) -> np.array: 8 """ Returns array with unique elements common to *all* indices 9 or the first index if it's the only one. """ 10 common_elements = reduce(np.intersect1d, indices[1:], indices[0]) 11 return common_elements 12 13 14 def flatten_extra_dim(data: Optional[np.array]) -> Optional[np.array]: Created by: Dreamlone
Приветствую! Замечательно, что обратил внимание на наш open-source проект! Однако этот PR уже закрыт, но если хочется внести изменения, то мы будем только рады - предлагай / вноси изменения, открывай PR, и наслаждайся вкладом в репозиторий :)
Мы всегда рады новым участникам
1 from functools import reduce 2 from typing import Optional 3 4 import numpy as np 5 6 7 def find_common_elements(*indices: np.array) -> np.array: 8 """ Returns array with unique elements common to *all* indices 9 or the first index if it's the only one. """ 10 common_elements = reduce(np.intersect1d, indices[1:], indices[0]) 11 return common_elements 12 13 14 def flatten_extra_dim(data: Optional[np.array]) -> Optional[np.array]: Created by: gkirgizov
Yeah, поймал потом эту ошибочку с ndarray типами и завел #625