What is the way to limit shared memory and total threads?

In the context of CUDA, what is the way to limit shared memory and total threads used?

My requirement for tuning a kernel is that I want this kernel to use at most N bytes of shared memory per block, and also want it to use at most M threads, the latter one of which means all valid cases for tuning should satisfy A <= M where A = gridDim.x * gridDim.y * gridDim.z * blockDim.x * blockDim.y * blockDim.z.