The Librarian Without Clients
Unison, Borges, and the Ergonomics of Forgetting
The programming language Unison stores every function definition not by name but by a cryptographic hash of its abstract syntax tree. Rename a function, move it between codebases, distribute it across machines: the hash is stable because it derives from the structure of the computation itself, not from any human-imposed label. Identity is content.
This is structurally identical to Jorge Luis Borges’s “The Library of Babel,” but with one crucial inversion. Borges imagines a universe consisting entirely of a library — an infinite honeycomb of hexagonal rooms, each containing a fixed number of books, each book exactly 410 pages of text drawn from twenty-five orthographic symbols. The Library contains every possible arrangement of those symbols. Somewhere on its shelves is every book that has ever been written, every book that could be written, every faithful catalogue of the Library itself, every false catalogue, every refutation of every false catalogue. But the vast, crushing majority of its volumes are nonsense — random permutations of letters that signify nothing. The problem is not that the Library is incomplete. The problem is that it is total, and totality is indistinguishable from noise. The librarians wander the hexagons, searching for meaning among the shelves, but any system they might devise for finding a particular book is itself a book lost somewhere in the stacks.
Unison starts from the same premise — that identity is content — but solves the retrieval problem by making the hash the address. A function that computes the same thing will always produce the same hash. In Borges’s terms, Unison is a Library of Babel where every hexagonal room is numbered by the exact meaning of the book it contains, and you can walk directly to it. Near-duplicates, the bane of the Library, cannot exist: if two definitions are structurally identical, they collapse into the same hash. There is exactly one room for each distinct computation, and no two rooms contain the same book.
§
But the hash is shorter than the content it addresses. A 410-page book cannot be uniquely identified by a string shorter than 410 pages without some loss of information. The hash is a lossy compression: it maps a vast space into a smaller one, which means collisions — two different books producing the same hash — are possible in principle. In practice the probability is negligible, but the structure of the situation is that you have replaced the Library of Babel with a smaller Library of Babel. You haven’t escaped Borges’s architecture; you’ve built a more compact wing.
The reason the compact wing is useful is precisely that it discards information. The hash doesn’t encode what the function does in its full particularity. It encodes enough to distinguish it from every other function anyone has actually written, which is a much weaker requirement. It works because the occupied subset of the hash space is vanishingly small relative to the total space. The hash is less a coordinate system isomorphic to the content and more like a sparse postal code — short enough to be useful, long enough to be unique given current occupancy, but not a lossless representation of what lives at the address.
§
Now imagine the opposite. Suppose we are programming in Borges’s world, where all addresses must be as long as the content they refer to — where no compression is permitted. Every function call would be a complete inline expansion of the function being called. There are no names, no references, no indirection — just the full text of every dependency, nested to whatever depth is required. A program that calls three functions is three books laid open end to end, and if those functions call other functions, those books are physically embedded in the pages of the first. The source code of any nontrivial program would be a vast recursive inlay of complete texts.
This is what Unison’s hash abbreviates. And the remarkable thing is that the uncompressed version would actually work. It would be correct, unambiguous, immune to dependency conflicts, and fully self-describing. Every program would carry its complete provenance. There would be no drift between a reference and its referent because there would be no reference — only the referent itself. The gap between map and territory would be closed by eliminating the map entirely.
The cost is that it would be humanly unusable. Which reveals that the hash — and naming, and package management, and all the other machinery of indirection — is not solving a logical problem. The logic was already fine. It is solving an ergonomic problem: human beings cannot manipulate 410-page inline expansions. Every abstraction layer in programming is a concession to the fact that we are not the Library’s imagined readers, the ones who could hold the totality in view.
§
Selective forgetting, then, is what makes representation possible. A hash forgets almost everything about the content and retains just enough to tell things apart. A name forgets even more — it retains nothing about the content and works purely by convention. Both are useful exactly to the degree that they discard.
This runs against the grain of how programmers usually think about the problem. The aspiration is always toward more complete specification, more faithful representation, tighter correspondence between the map and the territory. But the Library of Babel is the limit case of that aspiration — total fidelity, zero compression, everything preserved — and it is completely unusable. The problem is not insufficient information. It is that uncompressed reality is operationally identical to noise.
The hash forgets the right things. The name forgets even more of the right things. The org chart, the API boundary, the module system — each one is a structured act of forgetting that makes a particular scale of operation possible. What we call organizational dysfunction might then be redescribed as forgetting the wrong things, or forgetting that you have forgotten — the moment when a lossy compression starts being treated as lossless, and the map is mistaken for the territory it selectively discarded.
§
But what happens when forgetting is no longer possible? If generation is free and storage is free and retrieval is free, the Library of Babel stops being a thought experiment and becomes the actual condition. Every possible document exists or can be produced on demand. The distinction between “written” and “could be written” collapses, and you are back in the hexagonal rooms.
In that situation the hash is worthless — not because it is technically broken but because the sparsity condition that made it useful no longer holds. When every possible function can be generated as cheaply as it can be retrieved, content-addressing loses its power to distinguish, because there is nothing left to distinguish from. The address works only as long as most addresses are uninhabited.
Authenticity is the claim that a particular artifact has a traceable origin in a particular intention — that someone meant this and not that. But intention is exactly the kind of thing that cannot survive the loss of sparsity. When the same text can be produced by a person, a machine, a stochastic process, or a lookup from the complete Library, the provenance becomes underdeterminate. It is not that authenticity is disproved; it is that it becomes empirically unrecoverable. You cannot hash your way back to it.
Representation depends on the same structure. A sign represents something because it could have been otherwise — because it was selected from a space of alternatives. But if every alternative is equally present, selection pressure drops to zero and the sign stops carrying information in the Shannon sense. A symbol that could equally be any symbol means nothing.
§
What survives, perhaps, is curation — the act of forgetting on behalf of someone else. Not producing the right text but suppressing all the others. The librarian becomes more important than the author, because authorship presupposes a scarcity that no longer obtains.
But curation without an audience is gardening without visitors. If every node in the network is selecting and suppressing, no node is in a position to receive a selection as authoritative, because each can perform the same operation itself. The librarian’s value depended on an asymmetry — I have access or judgment that you lack — and that asymmetry is what dissolves.
This is Stanislaw Lem’s Solaris transplanted to culture. In Lem’s novel, a planet-wide ocean generates elaborate structures — towering formations that mimic mathematical objects, architectural forms, even what appear to be replicas of human shapes — but originates in no intention and serves no communicative purpose. Generations of scientists build an entire academic discipline around interpreting the ocean’s productions, cataloguing them into taxonomies, disputing each other’s frameworks in journals and monographs. The interpretations never converge, because there is nothing to converge on. The ocean is not sending messages. It is just what a sufficiently productive medium does when left alone. A world of librarians without clients is a world of Lem’s scientists — everyone catalogues, classifies, and debates the output of a generative process that is indifferent to being catalogued. The apparatus of description runs, but it describes only itself.
The less bleak version of this might be that the practice of curation becomes autotelic — something done for its own sake, the way people garden. Not because the garden communicates something to an audience, but because the act of selecting and tending is itself the point. The librarian without clients is a gardener without visitors.
The term for this already exists: digital gardening. Which is either confirmation of the diagnosis or another instance of the problem. Someone curated that metaphor, and now it circulates as a description of the activity it is an instance of, and no one can tell whether it was coined or generated or recovered from the Library.