I’m having an interesting bug. In the custom datatypes framework, I’ve made it so that you’re able to implement your custom datatype operators in python by passing TVM some Python PackedFuncs which will be called in place of the operators. However, I’ve found that programs which use this functionality end up waiting on synchronization primitives when there are more than one thread:
((lldb) thread info all
thread #2: tid = 0x86e10, 0x00007fff6b9fd26e libsystem_kernel.dylib`swtch_pri + 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
thread #3: tid = 0x86e29, 0x00007fff6ba00916 libsystem_kernel.dylib`__psynch_cvwait + 10
thread #4: tid = 0x86e2a, 0x00007fff6ba00916 libsystem_kernel.dylib`__psynch_cvwait + 10
thread #5: tid = 0x86e2b, 0x00007fff6ba00916 libsystem_kernel.dylib`__psynch_cvwait + 10
thread #6: tid = 0x86e2c, 0x00007fff6ba00916 libsystem_kernel.dylib`__psynch_cvwait + 10
thread #7: tid = 0x86e2d, 0x00007fff6ba00916 libsystem_kernel.dylib`__psynch_cvwait + 10
thread #8: tid = 0x86e84, 0x00007fff6ba00916 libsystem_kernel.dylib`__psynch_cvwait + 10
thread #9: tid = 0x86e85, 0x00007fff6ba00916 libsystem_kernel.dylib`__psynch_cvwait + 10
thread #10: tid = 0x86e86, 0x00007fff6ba00916 libsystem_kernel.dylib`__psynch_cvwait + 10
thread #11: tid = 0x86e87, 0x00007fff6ba00916 libsystem_kernel.dylib`__psynch_cvwait + 10
thread #12: tid = 0x86e88, 0x00007fff6ba00916 libsystem_kernel.dylib`__psynch_cvwait + 10
thread #13: tid = 0x86ecb, 0x00007fff6ba00916 libsystem_kernel.dylib`__psynch_cvwait + 10
The interesting thing is that it gets through most of the work before hanging — for example, if i’m casting a vector of size 8 to a custom datatype, it will get through 7 of the casts before locking up.
I’m trying to debug this. Because it seems to be locking up in synchronization primitives, I was under the assumption that it’s not a problem with the generated code, but with the surrounding runtime system which waits for the code to complete. Any guidance or suggestions in debugging would be much appreciated!
For background: the custom datatype framework I’m working on allows users to register new datatypes, and then register operator implementations over those datatypes. The above code I’m referring to allows users specifically to implement their operators in Python. For example, I might make a mytype
datatype, and create its add operator in Python:
def mytypeAdd(a, b):
return a + b
This function will get wrapped into a PackedFunc. When TVM compiles the code and sees an add of two mytype
s, it will compile this into a call_packed_lowered
intrinsic which calls the Python PackedFunc implementing the add.