[ML] Line search feature bag fraction for classification and regression model training #1746

tveasey · 2021-02-17T15:56:12Z

Following on from #1733, we can get further speedups by line searching for the best feature bag fraction for data sets where we only need a fraction of the features per tree. For example, training time on Higgs 1M drops from 2585s to 1742s and we actually get a small improvement in accuracy because our hyperparameter search region is better initialised.

This makes three changes:

Adds a line search for the best initial feature bag fraction to use.
Adds a small linear penalty at most 1% minimum loss to encourage larger down sample factors and smaller feature bag fractions.
Handles better the case we have many features and relatively few training examples.

Some of the variable naming in CBoostedTreeFactory is also now misleading. I've split this change out into a non-functional commit. The functional commit is f9eacc2.

valeriy42

LGTM

include/maths/CBoostedTreeFactory.h

tveasey · 2021-02-19T10:47:22Z

CI on macos is expected to fail at the moment so I'll go ahead and merge.

…on model training (elastic#1746) Following on from elastic#1733, we can get further speedups by line searching for the best feature bag fraction for data sets where we only need a fraction of the features per tree. For example, training time on Higgs 1M drops from 2585s to 1742s and we actually get a small improvement in accuracy because our hyperparameter search region is better initialised. This makes three changes: 1. Adds a line search for the best initial feature bag fraction to use. 2. Adds a small linear penalty at most 1% minimum loss to encourage larger down sample factors and smaller feature bag fractions. 3. Handles better the case we have many features and relatively few training examples.

tveasey added 3 commits February 16, 2021 13:12

Better naming

2ee4c4c

Line search for best feature bag size

f9eacc2

Comment

0220e47

tveasey added >enhancement review v8.0.0 :ml/DataFrameAnalysis v7.13.0 labels Feb 17, 2021

tveasey requested a review from valeriy42 February 17, 2021 15:56

tveasey added 3 commits February 17, 2021 16:02

Docs

e98ea51

Test threshold for linux

8acf657

Test fixes

0195285

valeriy42 approved these changes Feb 18, 2021

View reviewed changes

include/maths/CBoostedTreeFactory.h Outdated Show resolved Hide resolved

Correct comment

fb62410

tveasey merged commit ebadbb5 into elastic:master Feb 19, 2021

tveasey deleted the line-search-feature-bag-fraction branch February 19, 2021 10:48

tveasey mentioned this pull request Feb 22, 2021

[7.13][ML] Line search feature bag fraction for classification and regression model training #1761

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Line search feature bag fraction for classification and regression model training #1746

[ML] Line search feature bag fraction for classification and regression model training #1746

Uh oh!

tveasey commented Feb 17, 2021 •

edited

Loading

Uh oh!

valeriy42 left a comment

Uh oh!

Uh oh!

tveasey commented Feb 19, 2021

Uh oh!

Uh oh!

[ML] Line search feature bag fraction for classification and regression model training #1746

[ML] Line search feature bag fraction for classification and regression model training #1746

Uh oh!

Conversation

tveasey commented Feb 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

valeriy42 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tveasey commented Feb 19, 2021

Uh oh!

Uh oh!

tveasey commented Feb 17, 2021 •

edited

Loading