progress

I'd rather be anything but ordinary

0%

GSoC Final Evaluations

GSoC 2020 has almost come to an end, this blog summarizes what I have done before GSoC’s final evaluation. There are three things that I want to introduce.

  • Composite class to combinate multiple machine learning algorithms
  • Suggest class name if class name not found
  • CrossValidation wrapper

    Composite class to composite multiple machine learning algorithm

    Ensemble methods use multiple learning algorithms to obtain better predictive performance, shogun have many machine learning algorithms, but it don’t have a good
    way to combinate those machine algorithms, so I write an Ensemble class to do this.
    the basis usage of it is as follows:
    auto ensemble = std::make_shared<EnsembleMachine>(lists);
    ensemble->add_machine(std::make_shared<MulticlassLibLinear>());
    ensemble->add_machine(std::make_shared<MulticlassOCAS>());
    ensemble->set_combination_rule(std::make_shared<MeanRule>());
    ensemble->train(train_feats, train_labels);
    The ensemble class now can combine multiple learning algorithms, but it don’t have a good interface to
    use, so I write a composite class with good interface to wrap the ensemble class.
    the basis usage of composite class is as follows:
    auto composite = std::make_shared<Composite>();
    auto pred = composite->over(std::make_shared<MulticlassLibLinear>())
    ->over(std::make_shared<MulticlassOCAS>())
    ->then(std::make_shared<MeanRule>())
    ->train(train_feats, train_labels)
    ->apply_multiclass(test_feats);
    And we can make composite became more nice, since Transformer and Pipeline have been added to shogun,
    we can combine those together. First we use transformer to transform data(such as normlize data), then
    we can combine multiple algorithms to do predict/regress task.
    auto pipeline = std::make_shared<PipelineBuilder>();
    auto pred = pipeline ->over(std::make_shared<NormOne>())
    ->composite()
    ->over(std::make_shared<MulticlassLibLinear>())
    ->over(std::make_shared<MulticlassOCAS>())
    ->then(std::make_shared<MeanRule>())
    ->train(train_feats, train_labels)
    ->apply_multiclass(test_feats);

    Suggest class name if class name not found

    Currently, shogun encourages use the create factory to create machine object, it is annoying when we type a wrong class name, such as when we type “sg.create_machine(“libsvm”)”, an exception that “Class libsvm with primitive type SGOBJECT does not exist.” will be thrown, this message should be more meaningful, the correct name should be suggested. Since shogun have Levenshtein Distance, it is convenient to implement this feature by Levenshtein distance, we can just compare the input name with all class name, and choose the class name which has the shortest distance. After add this feature, when we type “sg.create_machine(“liblinear”)”, a more meaningful exception will be thrown.
    SystemError: Class liblinear does not exist. Did you mean LibLinear ?

    CrossValidation wrapper

    When train a machine learning algorithm, cross-validation is a good way to avoid overfitting, and shogun have implement the cross-validation, but shogun still don’t have a good way to tune the hyper-parameters of an estimator, so I write this cross-validation wrapper to find the best hyper-parameters. The basis usage of the cross-validation wrapper is as follow:
    //create cross-validation splitting
    auto strategy = std::make_shared<CrossValidationSplitting>(labels_train, 2);
    //create machine object
    auto machine = std::make_shared<LinearRidgeRegression>();
    //create evaluation_criterion
    auto evaluation_criterion = std::make_shared<MeanSquaredError>();
    auto cv = std::make_shared<CrossValidation>(machine, strategy, evaluation_criterion);
    std::vector<std::pair<std::string_view, std::vector<double>>> params{{"tau", {0.1, 0.2, 0.5, 0.8, 2}}};
    //create cross-validation wrapper
    auto cv_wrapper = std::make_shared<CrossValidationWrapper<LinearRidgeRegression>>(params, cv);
    // the best hyper-parameter have been choosen
    cv_wrapper->fit(train_feats, labels_train);
    auto pred = machine->apply(test_feats);

My thoughts about GSoC 2020

This is my first time involving in a big project, before GSoC start, I think that I just need to code that I have proposed, but when GSoC starts, the thing becomes different, many unexpected situations happened, my mentors Heiko, Viktor and Gil give me a lot of help, they all are nice people, I learned many cool things from them.