Pointless Hacks

Always Prefer a Subtype

When choosing how to implement functionality in code, a rule of thumb I find helpful is always prefer a subtype. That is, given two implementations, choose the one that is a subtype of the other most of the time.

Definition of a Subtype

What is a subtype? Type systems vary slightly in which types are considered to be subtypes, but generally follow a similar pattern. For example, in the following C++ code, B is a subtype of A:

struct A {}; struct B : A {}

B may now be used anywhere an A is required:

void uses_a(A a) { /* do something with a */ } void takes_b(B b) { return uses_a(b); }

In fact, this property, that a subtype may be used anywhere the supertype was required, is the defining characteristic of a subtype, seen as a common thread through all type systems. That is of course why it is generally good to prefer a subtype - it can be used in at least the same number of situations as the supertype, and potentially more.

Subtype of an Implementation

A subtype of an implementation is one that may be used anywhere the original implementation could be used. As a simple example, using the types A and B from above, consider two functions returning each of the types.

A generate_a() { return A{}; } B generate_b() { return B{}; }

Since B is a subtype of A, the generate_b function may be used anywhere the generate_a function could have been used. This makes it strictly more general. note

In the next example, two functions taking the types as arguments make is seem as though there is a problem:

void uses_a(A a) { /* do something with a */ } void uses_b(B b) { /* do something with b */ }

Covariance and Contravariance

Naïvely using uses_b just because B is a subtype of A doesn't work. The following function could not replace the use of uses_a with uses_b, and so uses_b is obviously not more general. What is going on?

void creates_a_and_uses_it() { A a{}; uses_a(a); }

The solution is that uses_b is in fact not a subtype of uses_a. In fact, it is a supertype (the relationship is reversed). This property, that when B is a subtype of A, uses_b is a supertype of uses_a, is called contravariance, in contrast to the covariance of function return types. The rules are summarized as follows:

(covariance in return type) if B <= A, then f(X) -> B <= f(X) -> A (contravariance in argument type) if B <= A, then f(A) -> R <= f(B) -> R

This explains then, why when preferring a subtype, one should choose return types to be subtypes, but argument types to be supertypes - since then the resulting function will become a subtype and therefore be more general.

Generics

Generics are certainly more general than explicitly instantiated functions. What does the rule always prefer a subtype say about them? The following is an example of such a generic function:

double solve_quadratic (double a, double b, double c) { return (-b + sqrt(b * b - 4 * a * c)) / (2 * a); } template<class T> T solve_quadratic(T a, T b, T c) { return (-b + sqrt(b * b - 4 * a * c)) / (2 * a); }

Clearly the templated version of solve_quadratic is more general, since when instantiated with T = double the first function is recovered. T may also be instantiated with float, or even complex number types, making the generic version a strict subtype. The rule here is:

F<T> <= template<T> -> F<T> F<T> < template<T> -> F<T> when ∃T, U s.t. F<T>, F<U> are well-formed

Where the function template<T> -> F<T> is understood to be a type-level function. This also suggests that, since T is an argument to this type-level function, any restrictions on T turn it into a subtype, and therefore the overall template into a supertype. Always prefer a subtype therefore suggests to make as few assumptions as possible about T. One possible method of doing this here is to not assume that T may be multiplied by integers (since that might not be implemented). The following code makes fewer assumptions, and so is a subtype of the previous template:

template<class T> T solve_quadratic(T a, T b, T c) { return (-b + sqrt(b*b - a*c - a*c - a*c - a*c)) / (a + a); }

Efficient Example With Lambdas

A nice example of this principle in action is with C++'s std::function. std::function is a heap-allocated function object, which means it incurs new and delete each time it is created.

int sum_over_vector( const std::vector<int>& vec, std::function<int(int)> f) { int sum = 0; for (const auto& elem : vec) { sum += f(elem); } return sum; } int sum_vec_sq(const std::vector<int>& vec) { return sum_over_vector( vec, [] (int x) { return x * x; } ); }

This function works well, but to use it incurs a heap allocation which is very difficult to optimize away. The generic version is a strict subtype, but crucially also works with the anonymous lambda type directly, instead of being forced to use the constructor of std::function.

template<class F> int sum_over_vector( const std::vector<int>& vec, F f) { int sum = 0; for (const auto& elem : vec) { sum += f(elem); } return sum; } template int sum_over_vector<std::function<int(int)>>( const std::vector<int>&, std::function<int(int)>);

The old function is here, explicitly instantiated - but calling the new sum_over_vector with a lambda will now not incur any of the overhead from std::function.

Libraries

When writing a library is is sometimes better to not expose implementation details to the user of the library, as then if those implementation details change, the user will not be able to upgrade to the new version. This seems to fly in the face of always prefer a subtype.

struct ImplType : InterfaceType {}; ImplType api_function();

The reason to prefer InterfaceType in the above snippet is to force users of the library to not rely on behaviour specific to ImplType; that is, to force the user to prefer the subtype.

Viewing the interface to the library as a function can help conceptually to extend the idea of always prefer a subtype, giving the rule:

interface(A) -> lib <= interface(B) -> lib iff B <= A, where A, B are the types of all possible implementations

The choice of an interface for a library, therefore, should accept as many different possible implementations as possible - meaning that the interface functions themselves should be supertypes.