LLMs; Some Practitioner Perspective

Roland RE-501s; a $9 OpenAI API call; the wisdom of the practitioner combined with the audacity of the strategist.

I've spent a lot of time over the last week in Google Colab with lines of code like from langchain.chat_models import ChatOpenAI at the top of the file. Having one's hands wrist-deep in code and a specific business-value objective in mind is a wonderful feeling, and it generates a perspective that I think is uniquely valuable. I mean a type of perspective that is uniquely valuable. It's not the best marketing label, but I refer to it as "the wisdom of the practitioner combined with the audacity of the strategist". This is two hopefully complementary perspectives on the same matter, yoked together to produce change that is informed by experience, pragmatism, and vision.

That's the idea, anyway. I want to share my recent experimentation with using LLMs and ML models to poke at private corpora of text in case 1) that specific topic is interesting to you, 2) the topic of hastily bolting ChatGPT on to a SaaS is interesting/relevant to you (lots of SaaSes will be doing such bolt-ons very soon), or if 3) a sort of case study of the wisdom of the practitioner combined with the audacity of the strategist is interesting to you.

I'm in the early stages of several research projects where being able to complement my perspective on a bunch of text with a purposefully different perspective would be valuable. Additionally, having a "computer research assistant" help me poke at those bunches of text is also valuable.

This overlaps quite a bit with my usage of ChatGPT, which breaks down into 3 categories:

  1. Coding intern that's way better than me at coding

  2. Internet research assistant that provides tailored summaries that I find more useful than the lower 80% of Wikipedia articles

  3. Ideation partner

The last one, in particular, is where ChatGPT's constraints become interesting tools. Maybe some of you are familiar with the Oblique Strategies cards by Brian Eno1 and Peter Schmidt. Each card has a brief provocation on it, and you're meant to pull a card when facing a creative challenge and be unblocked by the resulting constraint. Here are a few of these provocations:

  • Accept advice.

  • Ask people to work against their better judgment.

  • Be less critical more often.

  • Children’s voices — speaking — singing.

  • Discover the recipes you are using and abandon them.

  • Give way to your worst impulse.

  • Look closely at the most embarrassing details and amplify them.

That's enough to give you the feel of these provocations. If you've spent much time using ChatGPT, you might sense the similarity. ChatGPT doesn't intentionally2 provoke in the way Oblique Strategies does, but it unintentionally provokes. It unintentionally provokes by:

  • Representing a sort of "Internet consensus" on lots of things. This is tweakable with changes in prompting, but the gravity well of Internet consensus is a strong default with ChatGPT.

  • Being at times subtly, grossly, or hilariously wrong

  • Being chipper and helpful in pursuing a subtle or gross misunderstanding of the prompter's intent

  • Being widely-informed about a vast number of topics

So many absolutely precious, beautiful things have come from those who recognized the artistic or functional use-value of mis-applied, limited, or "broken" tools:

  • The lead instrument in this song was a somewhat failed attempt to replace pipe organs with something cheaper.

  • The artist presses on the fragile tape playback mechanism of their Roland RE-501 Chorus Echo in this song to create a subtle, interesting pitch-shifting effect.

  • (It's been a while and my memory of it is imperfect but I remember:) A farmer in Africa sells his goat by writing the price and his contact details on a piece of paper, placing the paper next to the goat, photographing both with his mobile phone, and posting the photograph to Instagram. (This was before we commonly thought of Instagram as an ecommerce tool.)

  • I think most of us know the Post-it notes story.

  • I'm pretty sure the manufacturer of the guitar amplifier used by Willie Kizart on Rocket 88 worked to reduce distortion in the amplifier system, not increase it.

  • I'd never thought of compressing a research recruitment message into 400 characters and using that text as a custom LinkedIn connection message and leveraging LinkedIn's fine-grained searching ability to connect with and try to recruit research participants until my friend Ari Zelmanow told me about this approach, which is an exceedingly clever, scrappy "mis-use" of a tool.

When I intentionally leverage ChatGPT's unintentional provocations, I find it a useful ideation tool. I was already aware of tools like https://www.humata.ai/ (their headline: "Ask AI anything about your files"), and so I was aware there were some black-box ways to combine a private corpus of files with ChatGPT. Given ChatGPT's impressive ability to summarize, poke at, and create useful provocations around an on-Internet topic, I became intensely curious about this ability but applied to a private corpus of files.

Then somewhere on Twitter I saw some tweets about Langchain.

Langchain is a library that makes it super easy to ingest a corpus of documents and use ChatGPT to summarize or ask questions of this corpus of documents. Langchain does not train a private LLM on those documents. Instead, it uses a variety of techniques to feed a subset of those documents into ChatGPT alongside your prompt to ChatGPT. The "variety of techniques" is actually the critical piece here, and the place where the wisdom of the practitioner compensates for the way the audacity of the strategist will gloss over or be ignorant of hugely consequential details.

ChatGPT has a limitation on how long your prompts can be. So if you are me last week, using your entire website as the corpus of documents that you're experimenting with (211,600 words in this corpus), the first thing you learn is that you're not gonna just concat all those documents and paste them into the ChatGPT prompt. You're going to pre-process the documents in basically one of two ways:

  • Method 1: Ingest the documents into what's known as a vector database

  • Method 2: Reduce the size of the corpus with a chunking-and-progressive summarization technique known as map-reduce

Most of us would intuit that you can't just concat 211,600 words of content into one file and copy/paste it into ChatGPT and use the usual prompt engineering techniques to interrogate those docs. But I'm not sure the audacity-of-the-strategist perspective knows and updates their models with the fact that running a Langchain map-reduce to summarize 211,600 words of content costs $9 of OpenAI credits and takes around 30 minutes and runs into at least one OpenAI API rate-limit. :) The practitioner who has been coding up multiple ways to answer questions from that 211,600 words of content knows this very well! I know this because I tried it.

This doesn't mean anything good or bad, per-se, about the overall approach. It just means this particular method is costly, expensive, and therefore limited in usefulness. It's not economically-viable at SaaS scale. But there's that other method, involving a vector database. What does the practitioner know about that?

It's way cheaper: around 40 cents to ingest that same 211,600 words of content (if you use the OpenAI API, which is not the only option for this task) and maybe 5 cents per query after it's ingested.

To understand this approach, you need to roughly understand what a vector database is. Multiple open-source and SaaS vector databases exist. Pinecone is a commonly-used SaaS option with this excellent explainer: https://www.pinecone.io/learn/vector-database/. Here's a good 1-paragraph excerpt:

Vector databases excel at similarity search, or “vector search.” Vector search enables users to describe what they want to find without having to know which keywords or metadata classifications are ascribed to the stored objects. Vector search can also return results that are similar or near-neighbor matches, providing a more comprehensive list of results that otherwise may have remained hidden.

So, when you use Langchain (remember, that's one of about 3 libraries that makes it easy to glue a corpus of docs and ChatGPT together) to use the only economically-viable-at-scale method to query a corpus of documents, what actually happens is this:

  1. Langchain takes your query and first runs it against a vector database that you have already populated with your corpus of documents.

  2. The vector database returns 4 or 5 (by default; I've gotten it to go up to 24 with GPT4) of what it determines to be the best-match documents for your question.

  3. Langchain constructs a prompt that includes your original query and those additional documents that the vector database provided and submits all this to the ChatGPT API.

  4. ChatGPT does what it will with your query and returns its answer.

You can see that there are three things that effect the quality of the response you get from this system: your prompt, how that prompt interacts with the vector database, and how the vector-database-chosen subset of documents+your prompt interacts with ChatGPT.

I would bet money that every SaaS that adds LLM/ChatGPT functionality in the next 6 months does so using Langchain or LlamaIndex. Full stop. And I'd bet they're not going to go much beyond the default way of using those libraries.

What this means is that the ability of this bolted-on feature to create task-specific value will really be defined by how intelligently the SaaS's developers can tweak, prod, and cajole the vector database part of the system. This doesn't mean that these new SaaS+ChatGPT integrations have no value. It does mean that unlocking ChatGPT's value inside a SaaS (or other non-SaaS workflow) is going to require yoking together the audacity of the strategist and the wisdom of the practitioner with a view to the unique features of, and need to creatively exploit the constraints of, LLMs.

This work, I suspect, has a lot more in common with Nils Frahm pressing just the right way on the playback mechanism of his Roland RE-501 Chorus Echo than it does with copy/pasting from the Langchain-provided Colab notebook and updating the marketing website to talk about ChatGPT integration.

Will my week of experimentation with all this stuff create the task-specific value that I was hoping for? I'll update you on that in a part 2 of this because as of today, I haven't gotten there yet. I've just gotten to the point where I can use Langchain (and Haystack; more on that in part 2) to poke at my corpora of documents in what promises to be a useful way. But I wanted to do a write-to-disk-checkpoint on what I've learned thus far, hopefully in a way that's interesting/useful to y'all.

More soon, -P