PrivateGPT 是一个可投入生产的 AI 项目，可让您利用大型语言模型 (LLM) 的功能提出有关文档的问题，即使在没有联网的情况下也可以使用。今天介绍下如何使用 PrivateGPT 来搭建 AI 应用。

PrivateGPT 是一项服务，它将一组 AI RAG 原语包装在一组全面的 API 中，提供私有、安全、可定制且易于使用的 GenAI 开发框架。

它使用 FastAPI 和 LLamaIndex 作为其核心框架。这些可以通过更改代码库本身来定制。并且它支持各种本地和远程的 LLM 提供商、嵌入提供商和向量存储。这些可以轻松更改，而无需更改代码库。

PrivateGPT 提供了一个 API，提供构建私有的、上下文感知的 AI 应用程序所需的所有原语。它遵循并扩展了 OpenAI API 标准，支持普通响应和流式响应。这意味着，如果您可以在您的工具之一中使用 OpenAI API，则可以使用您自己的 PrivateGPT API，无需更改代码，并且如果您在本地设置中运行 privateGPT，则免费。

如果我们要使用 PrivateGPT 搭建自己的服务，我们需要配置下面三个组件：

1
2
3

1. LLM：用于推理的大型语言模型提供者。它可以是本地的，也可以是远程的，甚至是 OpenAI。
2. Embeddings：Embeddings 提供者用于对输入、文档和用户查询进行编码。和LLM一样，可以是本地的，也可以是远程的，甚至是OpenAI的。
3. Vector Store（向量存储）：用于索引和检索文档的存储。

还有有一个可以启用或禁用的额外组件：UI。它是一个 Gradio UI，允许以更用户友好的方式与 API 进行交互。个人觉得这个 UI 组件不是太好看。

下面介绍 PrivateGPT 的使用，环境是 MacOS M2。

安装项目依赖

首先，我们需要确保 Python 的版本是 3.11 及以上，可以使用 pyenv 安装 Python：

1	pyenv install 3.11 # 安装 python3.11

然后，执行下面的命令，将项目克隆到本地，并指定 python 环境：

1
2
3

git clone https://github.com/imartinez/privateGPT
cd privateGPT
pyenv local 3.11      # 指定 python3.11 版本为本地目录及子目录使用

接下来需要安装 poetry 以及 make：

1 2	curl -sSL https://install.python-poetry.org \| python3 - # 安装 poetry brew install make # 安装 make

安装完 poetry 后，我们可以使用它来安装运行 PrivateGPT 所需要的一些模块，包括 LLM、Embeddings、Vector Store，甚至是 UI：

poetry install –extras “ …”

安装的具体例子如下：

1	poetry install --extras "ui llms-ollama llms-openai embeddings-ollama embeddings-openai vector-stores-qdrant"

上面的例子中，我使用 poetry 分别安装了 UI、llms-ollama（基于Ollama LLM搭建PrivateGPT服务的模块）、llms-openai（基于OpenAI搭建PrivateGPT服务的模块）、embeddings-ollama（基于Ollama的词嵌入模块）、embeddings-openai（基于OpenAI的词嵌入模块）、vector-stores-qdrant（基于Qdrant的Vector Store），当然还可以安装其他的模块，具体可以参看官方文档中的说明。

安装完成后，我们就需要进行配置了。

首先是设置一个环境变量 PGPT_PROFILES，它指定了启动 PrivateGPT 时需要加载的配置文件，默认是加载 PrivateGPT 目录下的 settingsyaml 文件，我们如果进行如下配置：

1	PGPT_PROFILES=ollama

就表示要加载 settings-ollama.yaml 这个配置文件，这个配置文件的内容大致如下：

server:
  env_name: ${APP_ENV:ollama}

llm:
  mode: ollama
  max_new_tokens: 8192
  context_window: 3900
  temperature: 0.1     #The temperature of the model. Increasing the temperature will make the model answer more creatively. A value of 0.1 would be more factual. (Default: 0.1)

embedding:
  mode: ollama

ollama:
  llm_model: qwen
  embedding_model: nomic-embed-text
  api_base: http://localhost:11434
  tfs_z: 1.0              # Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting.
  top_k: 40               # Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)
  top_p: 0.9              # Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)
  repeat_last_n: 64       # Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)
  repeat_penalty: 1.2     # Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)
  request_timeout: 120.0  # Time elapsed until ollama times out the request. Default is 120s. Format is float.

vectorstore:
  database: qdrant

qdrant:
  path: local_data/private_gpt/qdrant

我们可以更改 ollama 下面的参数值来使用我们想要的模型和配置（首先需要把本地的 ollama 服务启动，并且需要启动 nomic-embed-text 模型：ollama pull nomic-embed-text）。

如果配置成：PGPT_PROFILES=openai，就表示使用 OpenAI 来搭建 PrivateGPT 服务，settings-openai.yaml 的配置大致为：

server:
  env_name: PGPT_OPENAI

llm:
  mode: openai

embedding:
  mode: openai

openai:
  api_key: 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
  model: gpt-3.5-turbo

当然还可以配置成 PrivateGPT 支持的其它 LLM 供应商，具体有哪些可以参考官方文档。

运行PrivateGPT项目

配置好以后，运行以下命令启动 PrivateGPT 服务：

make run

如果使用了 Gradio UI 组件，启动后在浏览器上访问 http://localhost:8001。

从 Gradio UI 这个页面上可以看出，它有三种运行模式：Query FIles、Search Files、LLM Chat，其中：

1
2
3

Query FIles：使用提取文档中的上下文来回答聊天中发布的问题。它还将之前的聊天消息作为上下文考虑。使用 /chat/completions API 并使用 use_context=true 且无 context_filter。
Search Files：在文档中搜索：快速搜索，返回 4 个最相关的文本块及其源文档和页面。使用 /chunks API，不带 context_filter、limit=4 且 prev_next_chunks=0。
LLM Chat：与 LLM 进行简单、非上下文的聊天。不会考虑摄取的文档，只会考虑之前的消息。通过 use_context=false 使用 /chat/completions API。

API调用

PrivateGPT 服务启动后，还提供了 API 接口供我们调用，主要分为两块：

1. 高级 API，抽象了 RAG（检索增强生成）管道实现的所有复杂性：
1）文档摄取：内部管理文档解析、分割、元数据提取、嵌入生成和存储。
2）使用所摄取文档中的上下文进行聊天和完成：抽象上下文检索、提示工程和响应生成。
2. 低级 API，允许高级用户实现自己的复杂管道：
1）嵌入生成：基于一段文本。
2）上下文块检索：给定查询，从摄取的文档中返回最相关的文本块。

对于 Python，我们可以通过安装 pgpt_python 包调用 PrivateGPT 接口，但是需要 Python 的版本大于等于 3.12：

1	pip install pgpt_python

API 调用方式如下：

from pgpt_python.client import PrivateGPTApi

# 创建 PrivateGPT 实例
client = PrivateGPTApi(base_url="http://localhost:8001")

# 检查实例状态
print(f"client status: {client.health.health()}")

# 调用prompt completion接口
prompt_result = client.contextual_completions.prompt_completion(
    prompt="你是谁"
)
print(f"prompt completion result: {prompt_result.choices[0].message.content}")

# 调用prompt completion接口，流式输出
for i in client.contextual_completions.prompt_completion_stream(
    prompt="你是谁"
):
    print(i.choices[0].delta.content, end="")

# 调用chat completion接口
chat_result = client.contextual_completions.chat_completion(
    messages=[{"role": "user", "content": "Answer with just the result: 2+2"}]
)
print(f"chat completion result: {chat_result.choices[0].message.content}")

# 调用chat completion接口，流式输出
for i in client.contextual_completions.chat_completion_stream(
    messages=[{"role": "user", "content": "你是谁"}]
):
    print(i.choices[0].delta.content, end="")

# 调用embeddings接口
embedding_result = client.embeddings.embeddings_generation(input="你是谁")
print(f"embeddings generation result: {embedding_result.data[0].embedding}")

以上就是 PrivateGPT 的使用，感兴趣的同学可以试着搭建一下 PrivateGPT 应用。

PrivateGPT Github：https://github.com/zylon-ai/private-gpt

pgpt_python Github：https://github.com/zylon-ai/pgpt-python

文档：https://docs.privategpt.dev/

分享开发技术

PrivateGPT的使用

安装项目依赖

运行PrivateGPT项目

API调用