Repo for Q&A bot with chat interface and custom data ingestion with vector database. Built using LangChain and Streamlit.
Make sure you have at least python 3.8 and install dependencies by running pip(3) install -r requirements.txt
- Ingest your data in several formats and create a local vector database.
- Spin up a front-end UI using
Streamlitto ask questions against the built vector database. - LLMs are used to finalize the answer based on retrieved database documents.
- LLMs are also used to determine whether a topic of questioning has changed, which affects optimizations on chat history and memory.
You can check out the configuration file in src/cfg/default.cfg. It currently has one sample site vero.fi which scrapes the Finnish tax office website for Finnish tax regulation, guidelines and other related information.
There are several methods:
- Sitemap Ingestion: See
vero.fi.cfg. - Site Excel (individual pages listed in spreadsheet): See
chunshi.cfg. - PDF: See
sony_camera.cfg. There might be some code started on other methods but they are not mature yet.
- Run
python(3) src/ingest_data.py --site vero.fi [--debug](debug switch will only scrape a tiny portion of the site so testing can be rapid)
- Export your OpenAI or Azure OpenAI API related environment variables as follows:
| Variable Name | Description |
|---|---|
OPENAI_API_KEY |
Your OpenAI API key, or the Azure OpenAI resource's Key1 or Key2 (both are okay) if Azure |
OPENAI_API_BASE |
https://api.openai.com/v1 if OpenAI, or the Azure OpenAI resource's Endpoint value if Azure |
OPENAI_API_TYPE |
"open_ai" or "azure" |
- Run
streamlit run src/app.py - Open your browser at
localhost:8501?site=vero.fi