Always Prefer a Subtype
When choosing how to implement functionality in code, a rule of thumb I find
helpful is always prefer a subtype
. That is, given two implementations,
choose the one that is a subtype of the other most of the time.
Definition of a Subtype
What is a subtype? Type systems vary slightly in which types are considered
to be subtypes, but generally follow a similar pattern. For example, in the
following C++ code, B
is a subtype of A
:
struct A {};
struct B : A {}
B
may now be used anywhere an A
is required:
void uses_a(A a) {
/* do something with a */
}
void takes_b(B b) {
return uses_a(b);
}
In fact, this property, that a subtype may be used anywhere the supertype was required, is the defining characteristic of a subtype, seen as a common thread through all type systems. That is of course why it is generally good to prefer a subtype - it can be used in at least the same number of situations as the supertype, and potentially more.
Subtype of an Implementation
A subtype of an implementation is one that may be used anywhere the original
implementation could be used. As a simple example, using the types A
and B
from above, consider two functions returning each of the
types.
A generate_a() {
return A{};
}
B generate_b() {
return B{};
}
Since B
is a subtype of A
, the generate_b
function may be used anywhere the generate_a
function could have
been used. This makes it strictly more general.
note
In the next example, two functions taking the types as arguments make is seem as though there is a problem:
void uses_a(A a) {
/* do something with a */
}
void uses_b(B b) {
/* do something with b */
}
Covariance and Contravariance
Naïvely using uses_b
just because B
is a subtype of
A
doesn't work. The following function could not replace the use
of uses_a
with uses_b
, and so uses_b
is
obviously not more general. What is going on?
void creates_a_and_uses_it() {
A a{};
uses_a(a);
}
The solution is that uses_b
is in fact uses_a
. In fact, it is a supertype (the relationship is reversed).
This property, that when B
is a subtype of A
,
uses_b
is a supertype of uses_a
, is called
contravariance, in contrast to the covariance of function return types.
The rules are summarized as follows:
(covariance in return type)
if B <= A, then f(X) -> B <= f(X) -> A
(contravariance in argument type)
if B <= A, then f(A) -> R <= f(B) -> R
This explains then, why when preferring a subtype
, one should choose
return types to be subtypes, but argument types to be
Generics
Generics are certainly more general than explicitly instantiated functions.
What does the rule always prefer a subtype
say about them? The following
is an example of such a generic function:
double solve_quadratic
(double a, double b, double c)
{
return (-b + sqrt(b * b - 4 * a * c))
/ (2 * a);
}
template<class T>
T solve_quadratic(T a, T b, T c) {
return (-b + sqrt(b * b - 4 * a * c))
/ (2 * a);
}
Clearly the templated version of solve_quadratic
is more general,
since when instantiated with T = double
the first function is
recovered. T
may also be instantiated with float
, or
even complex number types, making the generic version a strict subtype. The rule
here is:
F<T> <= template<T> -> F<T>
F<T> < template<T> -> F<T>
when ∃T, U s.t. F<T>, F<U> are well-formed
Where the function
template<T> -> F<T>
is understood to be
a type-level function. This also suggests that, since T
is an
argument to this type-level function, any restrictions on T
turn
it into a subtype, and therefore the overall template into a supertype.
Always prefer a subtype
therefore suggests to make as few assumptions
as possible about T
. One possible method of doing this here is
to not assume that T
may be multiplied by integers (since that
might not be implemented). The following code makes fewer assumptions, and
so is a subtype of the previous template:
template<class T>
T solve_quadratic(T a, T b, T c) {
return (-b + sqrt(b*b
- a*c - a*c - a*c - a*c))
/ (a + a);
}
Efficient Example With Lambdas
A nice example of this principle in action is with C++'s std::function
.
std::function
is a heap-allocated function object, which means
it incurs new
and delete
each time it is created.
int sum_over_vector(
const std::vector<int>& vec,
std::function<int(int)> f)
{
int sum = 0;
for (const auto& elem : vec) {
sum += f(elem);
}
return sum;
}
int sum_vec_sq(const std::vector<int>& vec) {
return sum_over_vector(
vec,
[] (int x) { return x * x; }
);
}
This function works well, but to use it incurs a heap allocation which is very
difficult to optimize away. The generic version is a strict subtype, but crucially
also works with the anonymous lambda type directly, instead of being forced to
use the constructor of std::function
.
template<class F>
int sum_over_vector(
const std::vector<int>& vec,
F f)
{
int sum = 0;
for (const auto& elem : vec) {
sum += f(elem);
}
return sum;
}
template
int sum_over_vector<std::function<int(int)>>(
const std::vector<int>&,
std::function<int(int)>);
The old function is here, explicitly instantiated - but calling the new
sum_over_vector
with a lambda will now not incur any of the
overhead from std::function
.
Libraries
When writing a library is is sometimes better to not expose implementation
details to the user of the library, as then if those implementation
details change, the user will not be able to upgrade to the new version.
This seems to fly in the face of always prefer a subtype
.
struct ImplType : InterfaceType {};
ImplType api_function();
The reason to prefer InterfaceType
in the above snippet is to
force users of the library to not rely on behaviour specific to
ImplType
; that is, to force the user to prefer the subtype.
Viewing the interface to the library as a function can help conceptually
to extend the idea of always prefer a subtype
, giving the rule:
interface(A) -> lib <= interface(B) -> lib
iff B <= A,
where A, B are the types of all
possible implementations
The choice of an interface for a library, therefore, should accept as many different possible implementations as possible - meaning that the interface functions themselves should be supertypes.