-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bandicoot adaption #352
base: master
Are you sure you want to change the base?
Bandicoot adaption #352
Conversation
.appveyor.yml
Outdated
@@ -1,5 +1,5 @@ | |||
environment: | |||
ARMADILLO_DOWNLOAD: "https://sourceforge.net/projects/arma/files/armadillo-9.800.6.tar.xz" | |||
ARMADILLO_DOWNLOAD: "http://ftp.fau.de/macports/distfiles/armadillo/armadillo-8.400.0.tar.xz" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zoq is that version downgrade deliberate?
recent releases of ensmallen require armadillo 9.800+
include/ensmallen_bits/eve/eve.hpp
Outdated
typename std::enable_if<IsArmaType<GradType>::value, | ||
typename MatType::elem_type>::type | ||
typename std::enable_if<IsArmaType<GradType>::value || | ||
coot::is_coot_type<GradType>::value, typename MatType::elem_type>::type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One alternate idea here would be to make a combined IsArmaOrCootType
traits class and use that instead.
parent.iteration); | ||
const ElemType biasCorrection2 = 1.0 - std::pow(parent.beta2, | ||
const ElemType biasCorrection2 = 1.0 - pow(parent.beta2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason to un-qualify std::pow()
? For arma::
functions like arma::sqrt()
, it's clearly necessary so that ADL will work, but it doesn't seem necessary here. (I don't have a problem with it, I'm just curious if I overlooked something.)
|
||
template<typename ElemType> | ||
typename std::enable_if<coot::is_coot_type<ElemType>::value, ElemType>::type | ||
randn(const size_t rows, const size_t cols) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the RNGs I just merged to bandicoot should match the Armadillo API, so I think you should be able to remove these functions and just call randn<MatType>(...)
directly, etc.
This is super awesome! I'm still working through the related Bandicoot changes and opening PRs, but little by little I'll get everything opened in MRs there so that this will work. I only took a quick glance for now, but once I can properly test with Bandicoot I'll do a more detailed review. 🚀 |
@zoq I tried to review it, but there are massive amount of things that needs to be completed, this looks to me more of a draft at this stage, let me know if you need any help |
I think all of the SGD based optimizers are working (let me know if you miss one), which I think is what you will need, to move forward with the mlpack part? Some of the optimizers will have to wait for downstream implementations of e.g. So not sure what you mean by "massive amount of things that needs to be completed"? |
Signed-off-by: Omar Shrit <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome to finally see this come together. I reviewed most of the optimizer implementations, but still have to look at the CMake configuration and the tests.
For the tests, I think that the only function really worth testing with Bandicoot is the LogisticRegressionTestFunction
; almost all of the other ones compute the objective via individual element access, or operate on matrices so small that there is no hope of a GPU ever being fast enough (just the communication costs alone will be way too high).
Ideally I would hope that for at least this initial PR, we can get to a state where:
-
SGD and L-BFGS are reasonably fast enough that we can run the logistic regression tests. (I may need to do a little Bandicoot tuning and implementation behind-the-scenes.)
-
Other optimizers will at least compile with Bandicoot matrices, but we don't actually need to run the tests if they are slow. It should be straightforward enough to run with timing reports to see the optimizers where Bandicoot is obscenely slow (probably due to element access). In those cases, we can just add a test that creates the optimizer and either does 1 iteration only, or otherwise terminates immediately---then, the test is more of a check that that optimizer will at least compile with Bandicoot matrices.
I left some comments throughout, but often times the comment could be applied in many places but I only left it once. Let me know what you think, happy to help out with implementation or if there are other missing parts of Bandicoot that would make life easier. 😄
typename std::enable_if<IsArmaType<GradType>::value, | ||
typename MatType::elem_type>::type | ||
typename std::enable_if<IsArmaType<GradType>::value || | ||
IsCootType<GradType>::value, typename MatType::elem_type>::type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably make a combined traits class that is something like this:
template<typename MatType>
struct IsMatType
{
constexpr static bool value = IsArmaType<MatType>::value || IsCootType<MatType>::value;
};
Maybe that would help clean things up a little bit?
@@ -84,7 +85,8 @@ CD<DescentPolicyType>::Optimize( | |||
break; | |||
|
|||
// Update the decision variable with the partial gradient. | |||
iterate.col(featureIdx) -= stepSize * gradient.col(featureIdx); | |||
/* iterate.col(featureIdx) -= stepSize * gradient.col(featureIdx); */ | |||
iterate.col(featureIdx) -= gradient.col(featureIdx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add stepSize
back in?
return arma::as_scalar(arma::randi<arma::uvec>( | ||
1, arma::distr_param(0, function.NumFeatures() - 1))); | ||
return randi<size_t>( | ||
arma::distr_param(0, function.NumFeatures() - 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making this generic for Bandicoot might require something like GetFillType
(but adapted for distr_param
) from mlpack in src/mlpack/core/util/using.hpp
.
@@ -381,4 +381,4 @@ typename MatType::elem_type CMAES<SelectionPolicyType, | |||
|
|||
} // namespace ens | |||
|
|||
#endif | |||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we re-add the trailing newline? :)
parent.b += pow(stepSize, 2.0) / parent.b * | ||
pow(norm(gradient), 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parent.b += pow(stepSize, 2.0) / parent.b * | |
pow(norm(gradient), 2); | |
parent.b += pow(stepSize, 2.0) / parent.b * pow(norm(gradient), 2); |
Just a little cleanup 😄
{ return std::min(v, (typename MatType::elem_type) stepSize); } ); | ||
|
||
iterate -= gradient % x / (arma::sqrt(g2) + parent.epsilon); | ||
MatType x = min((g % g) / (g2 + parent.epsilon), lr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you avoid lr
by using clamp()
here to ensure that no element is greater than stepSize
?
if (arma::any(arma::vectorise(learningRates) <= 1e-15)) | ||
//if (any(vectorise(learningRates) <= 1e-15)) | ||
/* if (min(vectorise(learningRates)) <= 1e-15) */ | ||
if (learningRates.min() <= 1e-15) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this is a better implementation in Armadillo too.
@@ -43,17 +43,18 @@ typename MatType::elem_type SPSA::Optimize(ArbitraryFunctionType& function, | |||
MatType& iterate, | |||
CallbackTypes&&... callbacks) | |||
{ | |||
// Convenience typedefs. | |||
// Convenience typedefs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Convenience typedefs. | |
// Convenience typedefs. |
Oops, a space went missing.
@@ -94,8 +95,7 @@ typename MatType::elem_type SPSA::Optimize(ArbitraryFunctionType& function, | |||
const double ck = evaluationStepSize / std::pow(k + 1, gamma); | |||
|
|||
// Choose stochastic directions. | |||
spVector = arma::conv_to<arma::Mat<ElemType>>::from( | |||
arma::randi(iterate.n_rows, iterate.n_cols, | |||
spVector = conv_to<ProxyMatType>::from(randi(iterate.n_rows, iterate.n_cols, | |||
arma::distr_param(0, 1))) * 2 - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the distr_param
would need to be adapted here to be generic.
randu(const size_t rows, const size_t cols) | ||
{ | ||
#ifdef USE_COOT | ||
return coot::randu<ElemType>(rows, cols); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that for many of these proxy functions, we can use the using
trick that we do in mlpack.
I took a look into what to do about the Basically the idea would be to put |
Initial Bandicoot adaption.