GSoC 2020 has almost come to an end, this blog summarizes what I have done before GSoC’s final evaluation. There are three things that I want to introduce.
- Composite class to combinate multiple machine learning algorithms
- Suggest class name if class name not found
- CrossValidation wrapper
Composite class to composite multiple machine learning algorithm
Ensemble methods use multiple learning algorithms to obtain better predictive performance, shogun have many machine learning algorithms, but it don’t have a good
way to combinate those machine algorithms, so I write an Ensemble class to do this.
the basis usage of it is as follows:The ensemble class now can combine multiple learning algorithms, but it don’t have a good interface toauto ensemble = std::make_shared<EnsembleMachine>(lists);
ensemble->add_machine(std::make_shared<MulticlassLibLinear>());
ensemble->add_machine(std::make_shared<MulticlassOCAS>());
ensemble->set_combination_rule(std::make_shared<MeanRule>());
ensemble->train(train_feats, train_labels);
use, so I write a composite class with good interface to wrap the ensemble class.
the basis usage of composite class is as follows:And we can make composite became more nice, since Transformer and Pipeline have been added to shogun,auto composite = std::make_shared<Composite>();
auto pred = composite->over(std::make_shared<MulticlassLibLinear>())
->over(std::make_shared<MulticlassOCAS>())
->then(std::make_shared<MeanRule>())
->train(train_feats, train_labels)
->apply_multiclass(test_feats);
we can combine those together. First we use transformer to transform data(such as normlize data), then
we can combine multiple algorithms to do predict/regress task.auto pipeline = std::make_shared<PipelineBuilder>();
auto pred = pipeline ->over(std::make_shared<NormOne>())
->composite()
->over(std::make_shared<MulticlassLibLinear>())
->over(std::make_shared<MulticlassOCAS>())
->then(std::make_shared<MeanRule>())
->train(train_feats, train_labels)
->apply_multiclass(test_feats);Suggest class name if class name not found
Currently, shogun encourages use the create factory to create machine object, it is annoying when we type a wrong class name, such as when we type “sg.create_machine(“libsvm”)”, an exception that “Class libsvm with primitive type SGOBJECT does not exist.” will be thrown, this message should be more meaningful, the correct name should be suggested. Since shogun have Levenshtein Distance, it is convenient to implement this feature by Levenshtein distance, we can just compare the input name with all class name, and choose the class name which has the shortest distance. After add this feature, when we type “sg.create_machine(“liblinear”)”, a more meaningful exception will be thrown.SystemError: Class liblinear does not exist. Did you mean LibLinear ?
CrossValidation wrapper
When train a machine learning algorithm, cross-validation is a good way to avoid overfitting, and shogun have implement the cross-validation, but shogun still don’t have a good way to tune the hyper-parameters of an estimator, so I write this cross-validation wrapper to find the best hyper-parameters. The basis usage of the cross-validation wrapper is as follow://create cross-validation splitting
auto strategy = std::make_shared<CrossValidationSplitting>(labels_train, 2);
//create machine object
auto machine = std::make_shared<LinearRidgeRegression>();
//create evaluation_criterion
auto evaluation_criterion = std::make_shared<MeanSquaredError>();
auto cv = std::make_shared<CrossValidation>(machine, strategy, evaluation_criterion);
std::vector<std::pair<std::string_view, std::vector<double>>> params{{"tau", {0.1, 0.2, 0.5, 0.8, 2}}};
//create cross-validation wrapper
auto cv_wrapper = std::make_shared<CrossValidationWrapper<LinearRidgeRegression>>(params, cv);
// the best hyper-parameter have been choosen
cv_wrapper->fit(train_feats, labels_train);
auto pred = machine->apply(test_feats);
My thoughts about GSoC 2020
This is my first time involving in a big project, before GSoC start, I think that I just need to code that I have proposed, but when GSoC starts, the thing becomes different, many unexpected situations happened, my mentors Heiko, Viktor and Gil give me a lot of help, they all are nice people, I learned many cool things from them.