个人笔记

MXNet编译相关

no 'return_type' compile error

可能是子模块的版本不对应造成的,解决方法:git pull & git submodule update & make clean & make -j

rabit框架使用总结

  1. Allreduce peforms reduction across different computation nodes and returns the result to every node.
  2. Broadcast is another method, this function allows one node to broadcast its local data to all other nodes.
#include <rabit.h>

int main(int argc, char *argv[]) {
  rabit::Init(argc, argv);
  // load the latest checked model
  int version = rabit::LoadCheckPoint(&model);

  // initialize the model if it is the first version
  if (version == 0) model.InitModel();
  // the version number marks the iteration to resume
  for (int iter = version; iter < max_iter; ++iter) {
    // at this point, the model object should allow us to recover the program state

    // each iteration can contain multiple calls of allreduce/broadcast
    rabit::Allreduce<rabit::op::Max>(&data[0], n);

    // checkpoint model after one iteration finishes
    rabit::Checkpoint(&model);
  }
  rabit::Finalize();
  return 0;
}

JNI调用native方法出现 java.lang.UnsatisfiedLinkError: XXXclass.XXXmethod()异常的解决办法

需要保持工程调用JNI的包名与so库定义的包名类名方法名一致