SYSTEM_LOG ENTRY
SYSTEM_LOG ENTRY
In our first deep-dive we are leveraging (local) Vector Stores and LangChain for Cost-Efficient, Context-Aware Conversations with GPT-4 in Node.js.

In our previous post, we discussed the importance of providing context when using large language models like ChatGPT to generate content. We explored the “garbage in, garbage out” concept and emphasized the need for precise, clear, and adequate context to receive accurate, relevant, and coherent information. We also shared two practical strategies for providing context: including context directly in the prompt and using embeddings.
In that post, I promised to provide a step-by-step guide on creating a script for generating and using embeddings and provide an example repository for reference. In this follow-up post, we will dive into the process of building a powerful, context-aware chatbot using Node.js, which supports not only unlimited context but also unlimited chat history in under 100 lines of code.
In this section, I will walk you through the step-by-step process of building a GPT-4 powered chatbot using Node.js. The chatbot leverages the HNSWLib vector store for unlimited context and chat history, allowing for more cost-efficient, context-aware conversations. By following along with this guide, you will gain a deeper understanding of the chatbot’s inner workings and be able to customize it to suit your needs.
Before we dive into building the chatbot, there are a few prerequisites you need to have in place. Ensuring these are set up correctly will make the process smoother and more efficient.
With these prerequisites in place, we are ready to start building our chatbot.
We are building a command-line application using Node.js and TypeScript so lets setup the project and the basic readline interface for interacting with the bot.
In a new directory of your choosing, create a package.json by running:
npm init
(The default values are fine)
We want to use TypeScript for this project so we’re going to install packages and configure it.
npm add --save-dev typescript tsx @types/node @tsconfig/recommended npm add dotenv npm install
mkdir src
To configure TypeScript for our project download the following tsconfig.json and put in the root of the project with the name tsconfig.json.
As the final step in our basic setup, we will change the main entry point in the project’s package.json and add some scripts to build and start the bot. We will also be using ES modules, so we will add the module type to the file.
{
...
"main": "./dist/index.js",
"type": "module",
"scripts": {
"build": "tsc --declaration --outDir dist/",
"dev": "tsx -r dotenv/config src/index.ts --inspect",
"start": "npm run build && node -r dotenv/config dist/index.js"
}
...
}Create the main file src/index.ts in your favourite IDE and add the following to it:
// src/index.ts
import * as readline from 'node:readline/promises';
import { stdin as input, stdout as output } from 'node:process';
const rl = readline.createInterface({ input, output });
while (true) {
const userInput = await rl.question('How can I help? ');
const response = `You asked: ${userInput
Because we are building a chatbot, we want a simple method to converse with the bot using stdin and stdout, this loop works perfectly for our purposes and as a bonus it can also output streams so we can enable OpenAI’s streaming mode later.
Run the bot using npm run dev or npm start, and you will see that the program currently does nothing more than parrot what you input. In the following sections, we will build upon this foundation to create a fully functional chatbot.
In this step, we will integrate the powerful OpenAI API into our chatbot using LangChain. LangChain is a convenient library that simplifies interactions with the API, streamlines the construction of prompts, and efficiently manages Vector Stores. By leveraging LangChain, we can focus on building a robust chatbot without getting bogged down in the complexities of creating embeddings and performing similarity searches.
It is perfectly possible to implement this bot without using LangChain, however the title of this post would probably be in Under 100 Lines of Code and would be less catchy!
To start, add LangChain to our dependencies:
npm add --save langchain
And make sure to add your OpenAI API key you created earlier to an .env file.
# .env
OPENAI_API_KEY=sk-***Lets update our bot to use the OpenAI API for the first time!
// src/index.ts
import * as readline from 'node:readline/promises';
import { stdin as input, stdout as output } from 'node:process';
import { OpenAIChat } from 'langchain/llms/openai';
const rl = readline.createInterface({ input, output });
const chat = new OpenAIChat({
streaming: true,
modelName: 'gpt-3.5-turbo
Note that I’ve set the modelName to
_gpt-3.5-turbo_for testing. We can set this to_gpt-4_later on to make use of OpenAI’s most powerful model but for now I would recommend using gpt-3.5-turbo as it is more than 10x cheaper.
Run the bot again and ask it a question and you will get an answer from OpenAI.

While the communication with the API is working, we can recognize that the API is stateless and has no recollection of our previous messages. Unlike when using the ChatGPT UI, it is the responsibility of the developer to implement any form of conversation history.
The OpenAI API supports sending a stream response, which enables us to start outputting its response immediately instead of waiting for the entire response. This feature allows for a more interactive and responsive chatbot experience, as the user receives feedback faster, creating a more engaging conversation.
To add streaming to our bot we can make use of LangChain’s callback system. Adapt index.ts as follows:
// src/index.ts
import { CallbackManager } from 'langchain/callbacks';
...
const callbackManager = CallbackManager.fromHandlers({
async handleLLMNewToken(token: string) {
output.write(token);
},
});
...
const chat = new OpenAIChat({
...
callbackManager,
...
});
while (true
If we were building a clone of ChatGPT, the next step would be to add transient chat history and send the entire conversation history to the API with each new request. While this approach works, as discussed in the previous post, it is limited by the maximum number of tokens allowed in a conversation and can be potentially expensive. If you are interested in learning how to implement such a memory using LangChain, you can read more about some options here.
We are not going down that route. Instead, we will add Vector Stores for both our context and conversation history, allowing them to grow infinitely. This approach enables us to send only the relevant context to the OpenAI API, thus saving money and ensuring more efficient context-aware conversations.
For those unfamiliar with Vector Databases/Stores, you can learn more about them here.
First, let’s create a more intelligent prompt for our purposes, informing the model about how it should act and that we will be providing context and chat history.
We will use a LangChain chain (ConversationChain) and prompt templates to help us deal with the boilerplate of assembling our prompt and sending it to the API.
The LangChain documentation states:
Chains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.
The most commonly used type of chain is an LLMChain, which combines a PromptTemplate, a Model, and Guardrails to take user input, format it accordingly, pass it to the model and get a response, and then validate and fix (if necessary) the model output.
Adapt the index.ts to make use of the ConversationChain:
npm add --save common-tags npm add --save-dev @types/common-tags
// src/index.ts
...
import {
ChatPromptTemplate,
HumanMessagePromptTemplate,
MessagesPlaceholder,
SystemMessagePromptTemplate,
} from 'langchain/prompts';
import { ConversationChain } from 'langchain/chains';
import { oneLine } from 'common-tags';
...
const chatPrompt = ChatPromptTemplate.fromPromptMessages([
SystemMessagePromptTemplate.fromTemplate(`You are an AI assistant called DocuBot, a world class copywriter and marketing professional, you will craft high-quality, professional, relevant and engaging marketing materials and copy, such as website copy, slogans, keywords, newsletters and social media posts tailored to the user's target audience. You have access to the chat history with the user and to context provided by the user. Some of the chat history is provided as JSON, don't output this JSON.

As mentioned, we will be using HNSWLib as our Vector Store. HNSWLib has the advantage of being an in-memory Vector Store that can be saved to a local file, so we don’t need to spin up any other services or use a cloud provider. There are many other alternatives if you are building an production-ready application. At DocIQ, we use a mixture of pgvector and redis.
Lets get started by installing hnswlib-node:
npm add --save hnswlib-node
Now we are going to add a directory for the content we want to index and for saving the Vector Store index to disk.
mkdir docs db
Create a file called example.md in the newly created docs folder and fill it with some content that you want our bot to know about. Alternatively can download and use my example file.
It’s time to finally add context to our bot:
// src/index.ts
...
import path from 'path';
import { HNSWLib } from 'langchain/vectorstores/hnswlib';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
import { DirectoryLoader } from 'langchain/document_loaders/fs/directory';
import { JSONLoader } from 'langchain/document_loaders/fs/json';
import { TextLoader } from 'langchain/document_loaders/fs/text';
...
const __dirname
Whew! That was a lot, but it was the largest step, so bear with me and we will go through it step-by-step:
We import HNSWLib and some helper classes from LangChain to help us create Embeddings, but the main chunk of code is the new loadOrCreateVectorStore and getRelevantContext methods. Let’s take a closer look at them in particular:
loadOrCreateVectorStore
This is where LangChain shines. This relatively simple method is reading files, splitting them into pieces, sending the pieces to the OpenAI Embedding endpoint, and creating a new Vector Store index for us.
getRelevantContext
Given a Vector Store and a question (prompt), this method will return the specified number of documents that best match the question semantically. To achieve this, LangChain automatically creates an Embedding of the question to perform an approximate nearest neighbor search against the records in our Vector Store.
The parameter numDocuments lets us tweak our context-aware bot by seeing the effect of returning more or fewer relevant pieces of context. Of course, the more we return, the larger the prompt that will be sent to the API.

Go ahead and run our bot again and see if it can already answer questions specific to the content you provided!
Incorporating an “unlimited size chat history” into our bot works similarly to adding context. We will use another HNSWLib Vector Store to keep track of our conversation and inject relevant parts of it into the prompt as needed. This approach allows our chatbot to maintain a comprehensive conversation history while efficiently managing its size and relevance.
mkdir memory// src/index.ts
...
import { Document } from 'langchain/document';
...
const memoryDirectory = path.join(__dirname, 'memory');
const memoryVectorStore = await loadOrCreateMemoryVectorStore(memoryDirectory);
...
while (true) {
const userInput = await rl.question('\x1b[92mHow can I help?\x1b[0m ');
const context = await
...
I won’t go through these methods step-by-step, as they are very similar to the ones we have already implemented. The only differences are that we return fewer results from the chat history compared to our context documents, and we need to manually save the chat history Vector Store after a question/response. We are also adding metadata to the store while saving, in case we want to implement a hybrid search at a later date.

The bot is now complete and can remember our prior conversations, as well as utilize any context we provide for content generation. This powerful combination of unlimited chat history and context allows our GPT-4 powered chatbot to deliver more accurate, relevant, and engaging responses at a fraction of the cost, providing a better user experience.
You can find the complete repository for our GPT-4 powered chatbot at https://github.com/gmickel/memorybot. The repository version includes some additional features that enhance the chatbot’s functionality:
I encourage you to explore the repository, experiment with the chatbot, and try out the additional features. By doing so, you’ll gain a better understanding of the chatbot’s capabilities and see how it can be customized to meet your specific needs. Don’t hesitate to share your feedback and suggestions, as they can help improve the chatbot even further.
The repository version now supports:
I hope you find this guide and the accompanying repository useful in building your own context-aware, GPT-4 powered chatbot. As always, and as mentioned in the previous post, consider the implications of sharing sensitive content. While providing context is crucial for generating accurate and relevant content, it’s important to be mindful of the information you share with AI systems (or any system for that matter). If you’re working with proprietary code, sensitive data, or confidential information, be cautious about including such details in your prompts or embeddings. Always ensure that you’re adhering to your organization’s security policies and best practices to protect your valuable assets.
If you enjoyed this post and found it valuable, please share it with others who might be interested in AI and chatbot development. Don’t forget to subscribe to my blog to stay updated on the latest trends, technologies, and best practices in the AI and software development industry. Your support and feedback are much appreciated, and they help me continue to create informative and insightful content for you. Happy coding!
Please support my work on Medium and get unlimited access by becoming a member using my referral link here. Have a nice day!
Subscribe to my weekly newsletter.
Connect on Twitter here.