This is my first blog aboud GSoC
Currently, the shogun Machine
base class has two main class member functions: train
and apply
. The current usage is:
shared_ptr<Features> train_data; |
The Machine
class is stateful, which makes people confused about which features
and labels
are trained. Features
and Labels
should not be stored in the object, so I started by refactoring this class to make it stateless.
The usage of the new API should look like this:
auto test_labels = create<LeastSquaresRegression>() |
First, we need to refactor all classes that are derived from Machine
. The first step is to find all the lines of code where the class accesses m_labels
.
Shogun has a large code base, so it is very hard to simply find all the lines of code that make Machine
stateful with respect to m_labels
. So my mentors suggested that I use libtooling to find all these instances. Libtooling is a really powerfull tool, with a steep learning curve, used by the clang compiler to parse C++ code. After about one week, I have finally written this script.
The main matcher is memberExpr(member(hasName("m_labels"))).bind("func")
, then we can get the AST of the MemberExpr
. However, just printing all the lines of the MemberExpr
is not readable. We should add more context, such as class name, method name. The LLVM AST has builtin functionality that helped me find the method and class names. We can use getParents
to get CXXMethodDecl
, and when we get CXXMethodDecl
, it is easy to use getParent
to get CXXRecordDecl
. But the story does not end here, we only want to get every CXXRecordDecl
that are derived from Machine
. We have to restrict the base of the found CXXRecordDecl
to be derived from Machine
.
auto record = method->getParent(); |
Related PR:
Thanks:
gf712 helps me fix a lot of grammar errors of this post.