How to load and read the .params in big endian system?


#1

Hi,

Following the response of cross compile guidance, I’ve got the .so(big endian), .params, .json files and ARM-based executable(also big endian). But after running the executable on my target machine(64-bit, ARM, big endian), I got the following error(I followed the rule to load it in TVMByteArray format):

terminate called after throwing an instance of 'dmlc::Error’
what(): [09:48:02] src/runtime/graph/graph_runtime.cc:441: Check failed: header == kTVMNDArrayListMagic Invalid parameters file format

I noticed there was a similar issue happened before but the reasons which caused the issue were different: I guessed the reason should be byte order and he got the typo error.

So could someone tell me if there is any possible ways to load and read the .params (perhaps also the json file) in big-endian target machine?


#2

Ref: https://github.com/dmlc/tvm/blob/11318966571f654f4e8bc550bfd9a293303e3000/src/runtime/graph/graph_runtime.h#L17

Try changing these values to big endian and recompile the net.

If it works we could work out on changing these automatically.


#3

Hi,

I edited kTVMNDArrayMagic and kTVMNDArrayListMagic to big endian and the executable passed, but the output was [-nan -nan -nan]. Generally, the output of the same model in x86 should be a group with 3 probabilities just like [0.03 0.04 0.93] (my network model is a 3-classes model). Perhaps while reading the parameters, the sequence should be changed too.


#4

Great !

I think the data also need endian conversion and I doubted that.

It can be done in two ways now.

We could come up with a tool to reconstruct params for big endian, which is good start.
Or
Later handling the same on compilation process.


#5

Good idea.

So could you tell me if there is any general conversion tools like you said? I wanna try it at first.

Another question, could you tell me how to convert the data on compilation process? I’ve used several cross compile toolchains and changed compile flags but these didn’t achieve my goal. Any suggestions?

Thanks.


#6

:smile:

we need to make the tool, I could make one may be by tomorrow.
I don’t have a big endian environment to check now. I will share it and you can help to verify.

Hope it’s ‘float32’ all across the model you are using, pls confirm.

Just changing compiler flags doesn’t do the job for us. We need to add the data conversion logic while save params.


#7

Many thanks for your assistant!!!

The model was generated under generation guidance and deployed under deploy steps with the default opinion, i didn’t change them. So I think it keeps ‘float32’ all the time.

Back to the second way, you said

need to add the data conversion logic while save params

Does it mean I should add some functions (guess just like htol or ltoh) to convert the data in my code and then compile it?

Thanks a lot.


#8

I wonder how it pass the execution (even with wrong result) only by changing above macros.

Params byte stream parsing has many other values which need endian conversion (like the number of params, DLTensor internal fileds …etc.).


#9

Understand your confusion, I thought it should not be passed.

But I’m sure the executable passed (printed the added logs) and got the abnormal result [-nan -nan -nan] as I said. So the order of weights or other parameters were wrong. Due to some privacy policy, I cannot put the screenshots here. Sorry about that.

Aiming at getting the executable in ARM machine. I followed the way in cross compile guidance. Focusing on graph_runtime.cc, either closing the check macros or keeping them open, the executable could be generated and it ran well without any breaks (both in X86 and ARM machine). Sounds weird, but it happens.

My compiling environment is Ubuntu, x86_64, little-endian. The cross compiler is armeb-linux-gnueabi-g++.


#10

Is both target host and target on arm are llvm ?

Hope no GPU there?


#11

Definitely no GPU here.

Back to the first question. I just follow the target opinion to get the .params, .json and .so. To be honest, I’m not sure about target host. What I can tell you is the target on arm is not llvm.


#12

No probs, looks like its llvm all over.

https://github.com/srkreddy1238/nnvm/commit/edf917865a79ca823f5bcdcc7f318524cd84304c

Check if this patch works.


#13

Many thanks!

The training has to cost about 50 mins. I will reply the result here. You can check it when you are free. Just don’t want to waste your time : ).


#14

Any luck with this patch ?


#15

Sorry for late reply. A busy day today ; (

Sadly, it didn’t work. the result was still -nan -nan -nan. And I’m a little confused about two points:

The first one is, the position of the patch. The code was added in graph_runtime.cc in nnvm instead of tvm. I guess what your idea is to convert the byte order during writing the parameters, not the loading part.

The second one is, I didn’t use the graph_runtime.cc (in nnvm) either in generating model (so params and json files using python) or loading the model (only tvm using C++). Maybe we should move the patch into somewhere in tvm? Or just edit the patch in python?

Thanks.

(I tried to reply you many times but my network got crushed,I could not send message. Apologize for late reply again.)


#16

Don’t bother, I was just exited to check the results !!

You could try the same byte swap here and recompile the tvm runtime and give a try,


#17

Ref. https://github.com/dmlc/dmlc-core/commit/9b3f9753ae81d657743c555e0cacc4e43f0bed2d

Patch to support endianess from @tqchen


#18

Thanks for bring this issue up, the model saving of existing tvm and runtime is not endian aware, I am working on a patch to enable big endian support. https://github.com/dmlc/tvm/issues/1202


#19

Try out https://github.com/dmlc/tvm/pull/1206 to see if it works. This is a change that requires careful rework of all part of the serialization so I take a stab on this.

As a bonus, you can now directly use RPC server on your arm side and cross compile from your x86 host


#20

Got it! Thanks a lot!