How to load and read the .params in big endian system?

tamagotchi1221 · May 28, 2018, 3:37am

Hi,

Following the response of cross compile guidance, I’ve got the .so(big endian), .params, .json files and ARM-based executable(also big endian). But after running the executable on my target machine(64-bit, ARM, big endian), I got the following error(I followed the rule to load it in TVMByteArray format):

terminate called after throwing an instance of 'dmlc::Error’
what(): [09:48:02] src/runtime/graph/graph_runtime.cc:441: Check failed: header == kTVMNDArrayListMagic Invalid parameters file format

I noticed there was a similar issue happened before but the reasons which caused the issue were different: I guessed the reason should be byte order and he got the typo error.

So could someone tell me if there is any possible ways to load and read the .params (perhaps also the json file) in big-endian target machine?

srkreddy1238 · May 28, 2018, 3:14pm

Ref: https://github.com/dmlc/tvm/blob/11318966571f654f4e8bc550bfd9a293303e3000/src/runtime/graph/graph_runtime.h#L17

Try changing these values to big endian and recompile the net.

If it works we could work out on changing these automatically.

tamagotchi1221 · May 29, 2018, 1:54am

Hi,

I edited kTVMNDArrayMagic and kTVMNDArrayListMagic to big endian and the executable passed, but the output was [-nan -nan -nan]. Generally, the output of the same model in x86 should be a group with 3 probabilities just like [0.03 0.04 0.93] (my network model is a 3-classes model). Perhaps while reading the parameters, the sequence should be changed too.

srkreddy1238 · May 29, 2018, 2:14am

Great !

I think the data also need endian conversion and I doubted that.

It can be done in two ways now.

We could come up with a tool to reconstruct params for big endian, which is good start.
Or
Later handling the same on compilation process.

tamagotchi1221 · May 29, 2018, 3:06am

Good idea.

So could you tell me if there is any general conversion tools like you said? I wanna try it at first.

Another question, could you tell me how to convert the data on compilation process? I’ve used several cross compile toolchains and changed compile flags but these didn’t achieve my goal. Any suggestions?

Thanks.

srkreddy1238 · May 29, 2018, 3:35am

we need to make the tool, I could make one may be by tomorrow.
I don’t have a big endian environment to check now. I will share it and you can help to verify.

Hope it’s ‘float32’ all across the model you are using, pls confirm.

Just changing compiler flags doesn’t do the job for us. We need to add the data conversion logic while save params.

tamagotchi1221 · May 29, 2018, 3:56am

Many thanks for your assistant!!!

The model was generated under generation guidance and deployed under deploy steps with the default opinion, i didn’t change them. So I think it keeps ‘float32’ all the time.

Back to the second way, you said

need to add the data conversion logic while save params

Does it mean I should add some functions (guess just like htol or ltoh) to convert the data in my code and then compile it?

Thanks a lot.

srkreddy1238 · May 29, 2018, 11:45am

I wonder how it pass the execution (even with wrong result) only by changing above macros.

Params byte stream parsing has many other values which need endian conversion (like the number of params, DLTensor internal fileds …etc.).

tamagotchi1221 · May 29, 2018, 12:58pm

Understand your confusion, I thought it should not be passed.

But I’m sure the executable passed (printed the added logs) and got the abnormal result [-nan -nan -nan] as I said. So the order of weights or other parameters were wrong. Due to some privacy policy, I cannot put the screenshots here. Sorry about that.

Aiming at getting the executable in ARM machine. I followed the way in cross compile guidance. Focusing on graph_runtime.cc, either closing the check macros or keeping them open, the executable could be generated and it ran well without any breaks (both in X86 and ARM machine). Sounds weird, but it happens.

My compiling environment is Ubuntu, x86_64, little-endian. The cross compiler is armeb-linux-gnueabi-g++.

srkreddy1238 · May 29, 2018, 1:07pm

Is both target host and target on arm are llvm ?

Hope no GPU there?

tamagotchi1221 · May 29, 2018, 1:24pm

Definitely no GPU here.

Back to the first question. I just follow the target opinion to get the .params, .json and .so. To be honest, I’m not sure about target host. What I can tell you is the target on arm is not llvm.

srkreddy1238 · May 29, 2018, 1:29pm

No probs, looks like its llvm all over.

https://github.com/srkreddy1238/nnvm/commit/edf917865a79ca823f5bcdcc7f318524cd84304c

Check if this patch works.

tamagotchi1221 · May 29, 2018, 1:43pm

Many thanks!

The training has to cost about 50 mins. I will reply the result here. You can check it when you are free. Just don’t want to waste your time : ).

srkreddy1238 · May 30, 2018, 11:00am

Any luck with this patch ?

tamagotchi1221 · May 30, 2018, 1:27pm

Sorry for late reply. A busy day today ; (

Sadly, it didn’t work. the result was still -nan -nan -nan. And I’m a little confused about two points:

The first one is, the position of the patch. The code was added in graph_runtime.cc in nnvm instead of tvm. I guess what your idea is to convert the byte order during writing the parameters, not the loading part.

The second one is, I didn’t use the graph_runtime.cc (in nnvm) either in generating model (so params and json files using python) or loading the model (only tvm using C++). Maybe we should move the patch into somewhere in tvm? Or just edit the patch in python?

Thanks.

(I tried to reply you many times but my network got crushed,I could not send message. Apologize for late reply again.)

srkreddy1238 · May 30, 2018, 4:55pm

Don’t bother, I was just exited to check the results !!

github.com

dmlc/tvm/blob/54e32750a982aaf0ef1335f015e0d6b8715209ab/src/runtime/graph/graph_runtime.cc#L438


size_t size = (bits + 7) / 8;
for (int i = 0; i < dst->ndim; ++i) {
  size *= dst->shape[i];
}
uint64_t data_byte_size;
CHECK(strm->Read(&data_byte_size, sizeof(data_byte_size)))
    << "Invalid DLTensor file format";
CHECK(data_byte_size == size)
    << "Invalid DLTensor file format";
std::vector<uint8_t> bytes(data_byte_size + 1);
CHECK(strm->Read(&bytes[0], data_byte_size))
    << "Invalid DLTensor file format";
TVM_CCALL(TVMArrayCopyFromBytes(dst, &bytes[0], data_byte_size));
}


void GraphRuntime::LoadParams(dmlc::Stream* strm) {
uint64_t header, reserved;
CHECK(strm->Read(&header))
    << "Invalid parameters file format";
CHECK(header == kTVMNDArrayListMagic)
    << "Invalid parameters file format";

You could try the same byte swap here and recompile the tvm runtime and give a try,

srkreddy1238 · May 31, 2018, 2:23am

Ref. https://github.com/dmlc/dmlc-core/commit/9b3f9753ae81d657743c555e0cacc4e43f0bed2d

Patch to support endianess from @tqchen

tqchen · May 31, 2018, 3:04am

Thanks for bring this issue up, the model saving of existing tvm and runtime is not endian aware, I am working on a patch to enable big endian support. https://github.com/dmlc/tvm/issues/1202

tqchen · May 31, 2018, 4:31am

Try out https://github.com/dmlc/tvm/pull/1206 to see if it works. This is a change that requires careful rework of all part of the serialization so I take a stab on this.

As a bonus, you can now directly use RPC server on your arm side and cross compile from your x86 host

tamagotchi1221 · May 31, 2018, 9:24am

Got it! Thanks a lot!