The Tools That Retrieve, The Tools That Generate

The practice of distinguishing, one card at a time.

Most conversations about AI in the cultural sector treat all AI tools as a single category. They aren't. The most important distinction isn't between good AI and bad AI. It's between tools that work with material that already exists and tools that produce new material from it.

That line isn't always clean. A few tools sit firmly on one side. A few sit firmly on the other. Several sit on the line itself, and where they end up depends on how they're used.

Over the past year I have used five tools regularly enough to have honest opinions about them. None of them are good or bad in the abstract. Each one carries a particular relationship to the material it touches, and that relationship is the thing that matters for cultural heritage work.

Tools that retrieve

Whisper, by OpenAI

Whisper transcribes audio into text. It does not invent. It does not interpret. Within the limits of what it can hear accurately, it tries to render what was said.

For oral history work, this is the most useful AI tool available right now. A 1972 interview that has sat on tape for decades, never transcribed because nobody had the time or budget, can be turned into searchable text in minutes. For small heritage organisations carrying large collections, this changes what's possible.

The caveats are practical, not ethical. Whisper handles English well, including most accented English. It handles other major languages reasonably well. It struggles with code-switching between languages within a single sentence, which is exactly what happens in a lot of diaspora oral history. It misnames people, places, and culturally specific terms. A transcript that says "Altab Ali" might come back as "Altar Bali" if the model hasn't encountered the name often enough.

What this means in practice: the transcript is a draft, not a finished record. It needs human review by someone who knows the material, the language, and the community. That review is where the integrity of the transcription lives.

The tool retrieves. It does not generate. Used with care, it is one of the most useful AI tools available to small cultural organisations.

Chatbase

Chatbase is a platform for building chat interfaces that work strictly from documents you provide. It does not draw on general knowledge. It does not invent answers. When it doesn't know something, the system prompt can be configured to make it say so.

I built the MNEME Archive Guide prototype on Chatbase. It runs on six approved documents that make up a fictional archive I designed to demonstrate the methodology. When the bot is asked a question outside the collection's scope, it refuses. When it's asked to speculate about a living person, it refuses. When it's asked to invent a story from someone's testimony, it refuses and cites the policy it has been given.

This kind of retrieval-augmented chat is the most promising direction for AI in heritage work. The bot is not a brain. It is an index with a conversational interface. Its value lies in helping people find what is already there, not in producing what isn't.

The limits are worth naming. Chatbase runs on third-party infrastructure. Your messages are sent to its servers. The model it uses to generate responses is hosted elsewhere. For sensitive collections, those are real considerations. Production deployments for clients usually need a different architecture: dedicated infrastructure, tighter data controls, custom branding. But for a prototype, or for less sensitive material, Chatbase is genuinely useful.

The tool retrieves. Generation, where it happens, is constrained to the act of phrasing an answer that draws on supplied documents. That is a meaningful boundary.

Tools that sit on the line

ChatGPT

ChatGPT is the tool everyone has used and most people have an opinion about. The opinion that matters for cultural work is this: it can be either a retrieval tool or a generation tool depending entirely on how you use it.

Used as a thinking partner, with prompts that ask it to organise material you have provided, summarise a document you have given it, or critique your own writing, it is closer to a retrieval tool. You bring the substance. It helps you see your own substance more clearly.

Used as a writing tool, with prompts that ask it to draft text for you about a topic it has only general knowledge of, it becomes a generation tool. The output sounds plausible. It often is not accurate. For cultural material specifically, it tends to produce confident text that flattens specificity into the average of what it has read.

The risk in heritage work is using ChatGPT to write about a community or a collection rather than to think with you about material you already hold. A funding application that uses ChatGPT to "tell our story" produces a story that sounds like every other organisation's story. A funding application that uses ChatGPT to refine prose you have already written about your actual work is a different exercise.

The tool itself is neutral. The relationship you have with it is not. Treat it as a research assistant who has read a lot but has no memory of your specific context, and you stay in retrieval territory. Ask it to fill gaps in cultural knowledge it does not have, and you have crossed into generation.

Runway

Runway is harder to place because the platform contains several different tools under one name. Some of them retrieve. Some of them generate.

The restoration tools, particularly upscaling and stabilisation, work on existing footage. They make it clearer. They do not change what it shows. A degraded 1970s community video that nobody can quite see anymore can be made watchable again. The material is the same material. The tool has retrieved a clearer version of what was already there.

The generative tools, particularly text-to-video, are a different proposition. They produce footage that did not exist. For speculative or counter-archival work, this can be powerful. An artist exploring a history that was never filmed can generate footage of what might have happened. But for heritage work in an institutional sense, the question becomes harder. A museum that uses generated footage of a historical event runs the risk of producing material future researchers will mistake for actual evidence. Provenance becomes complicated. The line between interpretation and invention gets blurry.

Runway is a useful platform if you know which side of the line you're working on at any given moment. The risk is using it without thinking about which side that is.

Tools that generate

ElevenLabs

ElevenLabs synthesises human voices. With a short audio sample, the tool can produce new speech in that person's voice, saying things the person did not say. The quality is good enough that distinguishing real from synthetic by ear alone is increasingly difficult.

For cultural heritage work, my position is clear. I would not use this tool on any recording of a real person, living or dead, without their explicit and ongoing consent. In practice, that means I would not use it at all on most archival oral history material, because the consent given for the original recording does not extend to having your voice reused to say things you never said.

The temptation is going to be real. A museum will want to bring an oral history "to life" by having the interviewee read their own testimony aloud, or speak a fragment of a missing recording, or narrate an exhibition. The technology will make this seem easy. The ethical questions it raises are not easy.

A person's voice is part of their selfhood. The institutions that hold oral history recordings are stewards of someone else's testimony, not owners of it. Using their voice to say things they did not say is a category of action that previously did not have a name because it was not possible. Now that it is possible, the heritage sector needs to be careful about how it draws the lines.

I am not arguing the tool should not exist. I am arguing that its use in heritage contexts requires a kind of caution the sector has not yet developed. Until that caution exists, I would not recommend any organisation use it on their collections.

The line and what sits on it

I started by saying the most important distinction in AI tools is between those that work with existing material and those that produce new material from it. After five tools, the distinction holds, but it needs a small refinement.

The line is sometimes in the tool itself. ElevenLabs sits clearly on one side. Whisper sits clearly on the other.

The line is sometimes in the practice. ChatGPT can be either, depending on what you ask it to do. Runway can be either, depending on which of its tools you reach for.

The line is sometimes in the institutional posture. Chatbase is a retrieval tool, but how an organisation uses it, what documents it loads, what governance it puts around the bot's responses, all of that determines whether the deployment is responsible or not.

What does not work is treating these tools as if they all sit in the same category. Most cultural sector caution about AI is caution about generation. Most cultural sector opportunity in AI is in retrieval. The two get conflated in conversations all the time, and the result is either blanket scepticism or blanket enthusiasm, neither of which is useful.

The right question is not whether to use AI in cultural heritage work. It is which tools, in which configurations, with which guardrails, for which material.

The answers are not the same in every case. They never will be. That is part of what makes the work worth doing carefully.

The Tools That Retrieve, The Tools That Generate

Tools that retrieve

Whisper, by OpenAI

Chatbase

Tools that sit on the line

ChatGPT

Runway

Tools that generate

ElevenLabs

The line and what sits on it

Read more

We Were Born the Same Year

Crate Digging Is Archival Work

What Does Responsible AI Actually Look Like? I'm Still Working It Out.

Trust Cannot Be Automated