The mathematics of Strucutre
While working on programming language design—especially in the dependently typed space—I kept bumping into references to category theory. At first the diagrams and jargon felt distant and nonsense. Then I read a paper while studying IO implementations on functional programming languages, the seminal paper that introduces the IO monad. And finally some things started to make sense, mainly because identifying the structure of things allowed the author to get proofs and properties about what he developed. Once some piece of code satisfied the categorical rules, an entire suite of facts suddenly became available "for free".
This article is my current view and studies on category theory. It keeps a programming-language lens and leans on type theory intuition. Basically my way of spotting categorical structure on things I use daily and seeing how they connect through an abstract unfiying language.
What is Category Theory about?
At its core, category theory studies structure and composition. Comparing it with algebra, which studies operations on numbers, or logic, which studies proofs, category theory focuses on morphisms between things and the composition rules between them. It is a language that shows why so many areas (types, functions, logic, sets, proofs) share the same patterns.
A category consists of:
- Objects — things (types, propositions, sets, spaces, ...).
- Morphisms (or arrows) — directed relationships between objects (for example, in , morphisms are functions between Haskell types).
- A composition rule: if and , there must be a composed arrow .
- An identity arrow for each object, and two laws:
- Associativity: .
- Identity: .
And this is all that it takes to define a category.
So, making a connection with type theory, the objects are types, morphisms are functions, composition presents as function composition, and is the identity function. This is why we can say that Haskell types and functions form a category (). In logic, objects are propositions, and morphisms are proofs.
In programming language semantics, a language's denotational semantics often lives in a category, and things like monads and products correspond to categorical constructions.
Basically, category theory tells you to forget about what things are, and start thinking about how they relate and compose. Proofs, types, sets — it does not matter. For this, we have to think in arrows, and not worry about elements in or ; we only care about the arrows between them.
Now that we know what a category is, let's go to the next step: mappings between categories.
Functors: preserving structure
A functor is an arrow between categories. It:
- Assigns to each object in an object in .
- Assigns to each morphism a morphism .
- Preserves identity and composition:
In other words, functors preserve the entire compositional structure.
Endofunctors in
Most PL-facing functors are endofunctors: . The canonical action on morphisms is the familiar fmap
:
For the Maybe
endofunctor:
The ability to lift a plain function to is precisely the categorical action on arrows. That viewpoint explains why so many familiar structures (Maybe
, []
, IO
, Parser
, ...) automatically obey functor laws—they owe their existence to the categorical definition.
Endofunctors is just the fancy way to say that you're staying in the same category.
Beyond endofunctors
Not all functors stay within the same category. A forgetful functor drops the group structure down to its underlying set. A free functor freely adds group structure. Contravariant functors reverse arrow direction. There are also profunctors, bifunctors, and indexed functors, all sharing the same DNA: structure-preserving maps between categories. Keeping the functorial pattern in mind makes it much easier to recognize these relationships when they show up in PL seminars or denotational semantics papers.
Two meta-constructions are worth remembering:
- Identity functor: , . It does nothing, yet it satisfies the definition. This is the neutral element for functor composition.
- Functor composition: if and , then is again a functor. This is the categorical reason we can compose
fmap
s or stack functorial layers in code (thinknewtype
wrappers).
When debugging or designing an API, asking "what is the functor here?" forces you to check whether your transformations truly respect composition and identities. If they do, you inherit a large body of theory; if they do not, category theory tells you exactly which laws you are breaking.
Isomorphisms: sameness up to structure
This "up to something" notation was nothing a bit confusing to me, so my short explanation here since this appears everywhere, is basically that you're not insisting in equality literally, but instead saying that things are equivalent up to some point.
In this case saying they are equal up to isomorphism means that they're not necessarily the same object, but isomorphic ones. A simple example:
If we consider triangles, and if we ignore size and position and care only about their shape, then we can say that two triangles are the same up to congruence if one can be moved/rotated/reflected to overlap the other.
We can think as "up to" as meaning "after quotienting by an equivalence relation". Anyway, let's go back to isomorphisms.
Two objects and in are isomorphic (written ) if there exist arrows
such that
In programming terms, two types are isomorphic when we can convert back and forth without loss. The actual implementations of and do not matter—only their existence and the fact that they mutually undo each other. This is why category theory is called structural: objects are the same if their arrows behave the same.
The diagram for an isomorphism is the simplest commuting square imaginable:
The interpretation here is that travelling from to along and then coming back along lands you exactly where you started, and symmetrically for followed by . The two triangular loops shrink to identity paths. Whenever you can draw such a reversible loop you have shown that and encode the same information.
Concrete manifestations abound:
- In set theory, is isomorphic to (via Cantor pairing); the sets are not equal, but they carry the same cardinality structure.
- In Haskell,
newtype UserId = UserId Int
comes withwrap :: Int -> UserId
andunwrap :: UserId -> Int
that satisfy the isomorphism laws. The compiler can treat them interchangeably because everyUserId
is exactly oneInt
. - In logic, two propositions are isomorphic when they can be proved from each other—
A \land B
is isomorphic toB \land A
through exchange of witnesses.
Identifying isomorphisms lets you replace complicated objects with simpler surrogates without losing structure. It also guides API design: expose both directions of an isomorphism and you allow users to pick whichever representation is most ergonomic while staying within the same categorical class.
Universal properties: define things by how arrows touch them
Universal properties describe objects indirectly, purely via the way arrows interact with them. They characterise objects as "the most general thing" with a given mapping behaviour. Once a construction satisfies the universal property, any implementation of it will be isomorphic to any other.
Products
The product of and is an object equipped with morphisms and , such that for any and there exists a unique satisfying
As a commuting diagram:
In , is just the pair , projections are and , and is the map .
Read the diagram as a commuting traffic system. Starting at , you can either drive directly to via , or you can first detour through the product: go down to using the pairing arrow and then project via . The diagram asserts that both routes deliver the same passenger to . Likewise for the right-hand leg into . The product is the junction that makes both detours consistent for every possible pair of incoming maps.
Universal properties package two claims simultaneously:
- Existence: the arrow exists.
- Uniqueness: any other arrow with the same projection behaviour must equal .
That uniqueness clause is gold. It means we never need to commit to a specific implementation of pairs. Any structure with projections that satisfy the commuting triangle is the product. When refactoring code, it also tells us exactly how to factor a pair-building function: write down the two component arrows, take their universal pairing, and done.
Coproducts (sum types)
The coproduct (categorical sum) comes with injections and . For any and there exists a unique with
Diagrammatically:
In Haskell, is , injections are and , and acts by .
Here the diagram tells the dual story. To reach from , you can either map directly using , or inject into the coproduct via and then descend through . Commutativity guarantees these journeys coincide, and likewise for the leg. The coproduct is the universal junction through which every pair of outbound arrows factors.
The logical reflection (via Curry–Howard iso) is clear: products correspond to conjunctions, coproducts to disjunctions. Category theory records this once and for all: any implementation of pairs or sum types that satisfy these equations is the product or coproduct in its category.
Once you internalise universal properties, you stop memorising APIs and start deriving them. Need eitherToMaybe :: Either a b -> Maybe b
? Spot the arrows Left : a -> Either a b
and Right : b -> Either a b
, choose how each branch should land in Maybe b
, and the universal arrow drops out immediately. This is nice for proof assistants since now by showing that a candidate satisfies the universal property, you earn the right to treat it as the product, coproduct, limit, or whatever thing you defined.
Natural transformations: polymorphism as structure
Suppose are functors. A natural transformation assigns to every object a morphism such that for every the square
commutes, i.e. .
In , a natural transformation is exactly a polymorphic function of type that respects fmap
. For example, satisfies
for every f :: a -> b
; that is precisely the naturality condition.
The naturality square asserts that travelling across the top (apply then convert) yields the same destination as travelling down first (convert) and then across the bottom (apply ). The functor action and the transformation commute; the two routes from to collapse to the same arrow. Whenever you see a commuting square, imagine yourself walking it in both directions—naturality says the walk is path-independent.
Naturality is more than a technicality. It captures the promise that our polymorphic helper does not "inspect" the values of type a
—it only re-arranges structure. For contrast, consider
This function breaks naturality: mapping a function over the list and then calling takeHeadThree
is not the same as calling takeHeadThree
first and then mapping. The fact that natural transformations rule out such length-based hacks explains why they align with the "free theorems" we get from parametric polymorphism.
Once we see naturality, the polymorphic helper functions suddenly looks organised: they are the 2-morphisms (arrows between functors) in the 2-category , where objects are categories, morphisms are functors, and 2-morphisms are natural transformations.
Monads: monoids in the category of endofunctors
A monad on is an endofunctor together with two natural transformations:
They must satisfy the monoid-style equations:
Diagrammatically (mirroring the monoid unit and associativity):
This is why people summarise monads as "monoids in the category of endofunctors": the objects are endofunctors , the morphisms are natural transformations, and satisfy the monoid laws inside that category.
Read the diagram as two alternative "assembly" lines for flattening three layers of structure. The top path says: first inject a pure layer with , then flatten once with . The lower path says: assemble the inner layers using or , then flatten. Associativity states that no matter which order you bundle and collapse, the resulting composite arrow from to is the same. Identity laws arise from the smaller triangles where inserting and immediately flattening cancels out to the identity.
Haskell perspective
In , the unit is . The multiplication is . The familiar bind is derived:
The monad laws in textbooks are just the component-wise expressions of the diagrammatic laws above.
Now we have an intuition:
- Unit behaves like
mempty
: it embeds plain values into the structured world (return
,pure
). - Multiplication behaves like
mappend
: it flattens nested layers of structure (join
). - Associativity guarantees that the order in which we regroup nested structure does not matter, so pipelines built with
>>=
behave predictably.
With this, we can read a new monad instance and check the laws:
[]
(lists) form a monad where wraps a singleton list and concatenates a list of lists.Maybe
uses and that discardsNothing
while collapsing nestedJust
.IO
has and ajoin
that sequences effects as you would expect.
Every time you spot an endofunctor equipped with a unit and multiplication that satisfy the diagrams, you get a monad, and with it a ready-made Kleisli category, do-notation, and a mountain of theorems about composition.
Kleisli categories: sequencing computations
Given a monad , the Kleisli category keeps the same objects as , but morphisms are arrows in the base category. Identities are . Composition is:
exactly the categorical form of the Kleisli fish operator (>=>)
.
You can picture the composition as a three-step process:
- Start at and run to produce a -shaped computation.
- Lift so it can act on that structured value: .
- Flatten with to return to a single layer . Both and promise to produce effects; the Kleisli composite pipes the result of the first into the second while keeping the bookkeeping consistent.
In :
which is how we sequence effectful computations.
There is a canonical functor that embeds pure arrows into the monadic world by wrapping outputs with . That functorial lift is precisely what (or pure
) does operationally.
Thinking categorically clarifies a lot of things you take for granted when writing haskell:
do
-notation is just syntactic sugar for composing in .- Laws about short-circuiting (
Nothing
propagates) follow from the associativity of Kleisli composition. - The "Kleisli fish"
(>=>)
is literally just .
Moreover, the Kleisli category lets you compare effect systems. Two different monads and might induce the same Kleisli arrows for a subset of objects; if so, they agree on those computations even if their internal data representations differ.
A concrete example: let parse an integer, and check whether it is even. Their Kleisli composite
first parses, then validates. The associativity law guarantees that adding a third step—say, logging in —can be regrouped arbitrarily without changing behaviour. When you refactor do
-blocks, you are literally relying on the categorical fact that is a category.
Adjunctions: the origin of monads
An adjunction between categories and consists of functors
and a family of bijections
natural in and . We write and say "F is left adjoint to G".
Think of the bijection as a translator between two kinds of arrows. A morphism on the left is just a map from to inside the category ; a morphism on the right is a map from to inside . The statement says that every arrow of the first kind corresponds uniquely to an arrow of the second kind, and vice versa, with no information lost. So giving a map out of is exactly the same data as giving a map into —the Hom notation simply records which category each map lives in
Intuitively:
- freely adds structure (builds something like lists, monoids, pointed sets, products with a fixed object).
- forgets that structure (strips back down to the underlying set, type, or context).
- The bijection is a perfect translator between morphisms out of and morphisms into .
The translator is natural: it respects arrows in and . The triangle identities
ensure that translating an arrow forth and back gives you the same arrow you started with. When you see these triangles in a paper, read them as "the free-and-forget operations cancel each other up to isomorphism".
Currying as an adjunction
In , define
where is the set of functions . Then
via currying and uncurrying. The unit sends to the function . The counit is function application .
Every time you curry or uncurry in code, you are walking across this adjunction.
Maybe as an adjunction
Let be the category of pointed sets. The functor sends to (add a distinguished "failure" point). The right adjoint forgets which element is distinguished. The resulting monad is , i.e. the functor. The unit is ; the multiplication flattens by collapsing nested s and propagating .
Monads from adjunctions
Every adjunction produces a monad on , with
where is the counit. Dually, the composite is a comonad on .
This is why "monads are the shadows of adjunctions": they arise inevitably whenever a free/forgetful pair exists.
Conversely, given any monad , you can recover an adjunction whose composite is by looking at its Kleisli category. Adjunctions are therefore the organising principle behind monads, not an optional add-on: they explain why Maybe
, []
, IO
, Parser
, and your custom effect handlers can be studied with the same vocabulary.
Comonads: dualising the story
Everytime you hear things about "duals" our "dualising" things, what we're doing is basically inverting the arrows.
Flip every arrow in the monad definition and you get a comonad. Formally, a comonad on is an endofunctor with natural transformations
such that
Where monads model sequencing of effects, comonads model observation of contexts.
The laws are the mirror image of the monad diagram. They say: duplicate a context and immediately extract—on the left or on the right—and nothing changes; duplicating twice is independent of the order in which you expand. If you draw the corresponding diagram, the two paths from to (via or ) followed by the appropriate counit collapse to the same result. Think of as "zooming out" to see more surroundings, and as "reading off" the focused value. Whether you zoom out before or after peeking should not matter.
Store comonad
Define with data
(a function describing the entire space plus a current focus). The structure maps are:
Intuitively, expands a context so that every position gets its own focused store; extracts the value at the present focus. The derived extension operator
lets us compute new data by peeking at neighbourhoods—useful for cellular automata, streaming computations, or UI updates.
In a two-dimensional grid, for instance, lets every cell apply a rule that depends on its neighbours—exactly the operation required for blurring an image, for example. Duplicate first supplies, for every coordinate , a focused copy of the whole grid; mapping a rule over those copies produces the next state.
Other classic comonads mirror familiar monads:
- Environment comonad
Env e a = (e, a)
is dual to the Reader monad. You can extract a value, but you also have access to the entire shared environment when computing derived observations. - Stream comonad
Stream a
(infinite lists) lets you observe the current head whileduplicate
gives you the tail-focused substreams—perfect for time-varying systems or signal processing.
Whenever you have a notion of "value with surrounding context" and you want to compute something that depends on that context, look for a comonad.
From a semantic angle, we capture observations of an evolving system. Where monadic algebras describe how to build effectful computations, comonadic coalgebras describe how to respond to contextual information.
Co-Kleisli categories: composing observations
Given a comonad , the co-Kleisli category keeps the same objects but uses morphisms . Identities are . Composition is
Picture the dataflow: start with a context , duplicate it so that each possible focus yields its own contextual view, map across those views (obtaining a filled with intermediate results), and finally extract a via . The diagram that encodes this story mirrors the Kleisli one, with arrows reversed.
In , the co-Kleisli composition is (<=<)
defined by
Where the Kleisli category chains effectful functions , the co-Kleisli category chains context-sensitive observers . The two constructions are perfectly dual.
Operationally, a co-Kleisli arrow is the shape of a "widget renderer" or a "sensor interpreter": it takes the whole context (state, environment, neighbourhood) and returns a result. Composing two such observers first extends the context so that the downstream observer can see the enriched information, then applies the final extraction. This is a pattern where you successively refine a global state to produce outputs.
Example: suppose makeSnapshot :: Store World Model -> ViewState
derives a UI projection, and render :: Store World ViewState -> Html
turns that projection into DOM. Their co-Kleisli composite
duplicates the world so that every potential focus can build its own snapshot, maps makeSnapshot
, and finally extracts HTML. Because co-Kleisli composition is associative, you can restructure rendering pipelines (add instrumentation, cache layers, theming) without invalidating behaviour—as long as each stage is a comonadic observer.
Landspace
At this point, the categorical landscape lines up neatly:
- Level 0 (objects & arrows): Types, sets, propositions; functions/proofs as morphisms.
- Level 1 (functors): Type constructors and structure-preserving mappings.
- Level 2 (natural transformations): Polymorphic conversions between functors.
- Monads & comonads: Algebraic structure on endofunctors; sequencing versus observation.
- Kleisli / co-Kleisli categories: Categories of effectful or contextual morphisms.
- Adjunctions: The origin of monads/comonads; free vs. forgetful structure.
Each layer reuses the same patterns—identities, composition, universal properties—but in a progressively richer setting. That is where the power (and the economy) of category theory lies: once a concept is understood abstractly, it applies everywhere you can find the same diagrams.
When you read a paper or library code, it helps to find at which layer you wre. Are we defining new objects and arrows (level 0)? Are we mapping entire worlds of objects around with functors (level 1)? Are we moving between parallel worlds with natural transformations (level 2)? and so on.
Practising the pattern
To internalise these ideas, some cool exercises are:
- Curry/uncurry a function into and back; verify the adjunction bijection explicitly.
- For the
Maybe
adjunction, write down the unit and the counit ; check the resulting monad laws. - In the Store comonad, compute for a finite grid to observe how duplicate propagates context.
- Take a familiar API (e.g.
Reader r a = r -> a
) and identify the functor, unit, and multiplication. Draw the Kleisli composition explicitly; show howask
andlocal
fit the picture. - Pick a pair of functors you use frequently (say
[]
andMaybe
) and write a candidate polymorphic function between them. Test its naturality by checking the commuting square—does it truly ignore the shape of the data?
All the exercises basically sum up to chasing arrows through diagrams. The notation may feel abstract, but well.. It is abstract.
Closing thoughts
Category theory does not replace operational reasoning; it complements it. By focusing on arrows and the way they compose, we can see that familiar constructs are manifestations of the same abstract pattern. Once you see a functor, you check for natural transformations. Once you have a free/forgetful pair, you know a monad (and comonad) is hiding there.
Most importantly, this language lets us transport intuition across domains: proofs look like programs, effect handlers look like adjunctions, stream processors look like comonads. This is not exclusive to the type / programatic view I used in this article.
From my personal experience though, category theory is mostly useful when you already know / use some "implementation" or less abstract level of it. It is very hard / not intuitive and in my opinion not very useful to study this individually. But once you have the concepts and examples in your mind, it is easy to detect the patterns in the diagrams. And certainly good if you're designing new things, fit them in a pattern and suddenly you have a bunch of proofs and properties on what you are doing.
Quoting my friend @lane:
There is a deep correspondence between type theory and category theory, and some people characterize it by saying that type theory gives a "calculus" to category theory.
From my experience, learning the "calculus" first was easier. Still learning a lot and discovering new things on it.
If you're still here, thanks for reading! Feel free to make suggestions or corrections if you see any mistake.