[RELAY] custom arithmetic types (bfloat8, posit<nbits,es>, etc.)


#1

The desire is to make [RELAY] parameterized in its arithmetic type. New number systems such as bfloat8, posit<nbits,es>, and Elias gamma, delta, and omega encodings, improve on the representation of the reals for specific applications.

This discussion is seeking clarity on what, and how to implement these new number systems as arithmetic types in [RELAY] and the tensor operators in general.


#2

In the universal library (github.com/stillwater-sc/universal) and the HPR (High-Performance Reproducible) linear algebra libraries, we follow a simple rule: all algorithms need to be parameterized in their ‘Scalar’ representation, and any ‘Scalar’ type needs to implement the full arithmetic operation set (+,-,*,/) and all its implicit casts to native types. The latter is required to make literal assignments to these Scalar types syntactically intuitive.

The machinery we use is to require a simple type assignment to an abstract ‘Scalar’ type, which is then specialized to the desired arithmetic, as in:

using Scalar = posit<8,1>;
LinearOperator LinearWidget() {… }

Here is a full example of what that looks like for the stabalized BiCG:

template<typename Scalar>
void regular_BiCGStab() {
	using namespace std;
	const size_t size = 40, N = size * size;

	using Matrix = mtl::mat::compressed2D< Scalar >;
	using Vector = mtl::vec::dense_vector< Scalar >;

	// Create a 1,600 x 1,600 matrix using a 5-point Laplacian stencil
	Matrix A(N, N);
	mtl::mat::laplacian_setup(A, size, size);

	// Create an ILU(0) preconditioner
	itl::pc::ilu_0< Matrix >        P(A);

	// Set b such that x == 1 is solution; start with x == 0
	mtl::vec::dense_vector<Scalar>       x(N, 1.0), b(N);
	b = A * x; x = 0;

	// Termination criterion: r < 1e-6 * b or N iterations
	itl::noisy_iteration< Scalar >  iter(b, 5, (Scalar)1.e-6);

	// Solve Ax == b with left preconditioner P
	itl::bicgstab(A, x, b, P, iter);
}

template<typename Scalar>
void fdp_BiCGStab() {
	using namespace std;
	const size_t size = 40, N = size * size;

	using Matrix = mtl::mat::compressed2D< Scalar >;
	using Vector = mtl::vec::dense_vector< Scalar >;

	// Create a 1,600 x 1,600 matrix using a 5-point Laplacian stencil
	Matrix A(N, N);
	mtl::mat::laplacian_setup(A, size, size);

	// Create an ILU(0) preconditioner
	itl::pc::ilu_0< Matrix >        P(A);

	// Set b such that x == 1 is solution; start with x == 0
	mtl::vec::dense_vector<Scalar>       x(N, 1.0), b(N);
	b = A * x; x = 0;

	// Termination criterion: r < 1e-6 * b or N iterations
	itl::noisy_iteration< Scalar >  iter(b, 5, 1.e-6);

	// Solve Ax == b with left preconditioner P
	hpr::bicgstab(A, x, b, P, iter);
}

int main(int, char**)
{
	using namespace std;
	using namespace mtl;

	constexpr size_t nbits = 32;
	constexpr size_t es = 2;

	using Scalar = sw::unum::posit<nbits, es>;

	regular_BiCGStab<float>();
	fdp_BiCGStab< sw::unum::posit<32, 2> >();
	fdp_BiCGStab< sw::unum::posit<24, 3> >();

    return 0;
}

NOTE: the fdp_BiCGStab() refers to the fused dot product (FDP) functionality of posits that enable deferred rounding control for reproducible linear algebra. The FDP alters the linear algebra algorithms by passing the BLAS L1 dot product operator through a deferred rounding path and schedule. These details are not important for this discussion, but maybe helpful for the careful observer.

The using mechanism provided by C++ is a clean interface to support arithmetic types that themselves may be parameterized by a complex set of dimensions. Variable length floats and posits both have two parameters, and some of the Elias encodings have three parameters. The Scalar type definition unifies these variables, and the C++ arithmetic overloading mechanisms enable arithmetic algorithms.