AiConsulting

Model Configuration

Learn about the Different Models Supported by Dify.

Dify supports major model providers like OpenAI's GPT series and Anthropic's Claude series. Each model's capabilities and parameters differ, so select a model provider that suits your application's needs. Obtain the API key from the model provider's official website before using it in Dify.

Model Types in Dify

Dify classifies models into 4 types, each for different uses:

1.System Inference Models: Used in applications for tasks like chat, name generation, and suggesting follow-up questions.

Providers include OpenAI、Azure OpenAI Service、Anthropic、Hugging Face Hub、Replicate、Xinference、OpenLLM、iFLYTEK SPARK、WENXINYIYAN、TONGYI、Minimax、ZHIPU(ChatGLM).

2.Embedding Models: Employed for embedding segmented documents in knowledge and processing user queries in applications.

Providers include OpenAI, ZHIPU (ChatGLM), Jina AI(Jina Embeddings 2).

3.Rerank Models: Enhance search capabilities in LLMs.

Provider: Cohere.

4.Speech-to-Text Models: Convert spoken words to text in conversational applications.

Provider: OpenAI.

5.System Reasoning Model. In the created application, this type of model is used. Smart chat, dialogue name generation, and next question suggestions also use reasoning models.

6.Embedding Model. In the knowledge, this type of model is used to embedding segmented documents. In applications that use data sets, this type of model is also used to process user questions as Embedding.

7.Speech-to-Text model. In conversational applications, this type of model is used to convert speech to text.

Dify plans to add more LLM providers as technology and user needs evolve.

Hosted Model Trial Service

Dify offers trial quotas for cloud service users to experiment with different models. Set up your model provider before the trial ends to ensure uninterrupted application use.

OpenAI Hosted Model Trial: Includes 200 invocations for models like GPT3.5-turbo, GPT3.5-turbo-16k, text-davinci-003 models.

Setting the Default Model

Dify automatically selects the default model based on usage. Configure this in Settings > Model Provider.

Model Integration Settings

Choose your model in Dify's Settings > Model Provider.

Model providers fall into two categories:

1.Proprietary Models: Developed by providers such as OpenAI and Anthropic.

2.Hosted Models: Offer third-party models, like Hugging Face and Replicate.

Integration methods differ between these categories.

Proprietary Model Providers: Dify connects to all models from an integrated provider. Set the provider's API key in Dify to integrate.

Hosted Model Providers: Integrate third-party models individually.

Specific integration methods are not detailed here.

Hugging Face

Replicate

Xinference

OpenLLM

Using Models

Once configured, these models are ready for application use.

Hugging Face

Dify supports Text-Generation and Embeddings. Below are the corresponding Hugging Face model types:

Text-Generation：text-generation，text2text-generation

Embeddings：feature-extraction

The specific steps are as follows:

You need a Hugging Face account (registered address).
Set the API key of Hugging Face (obtain address).
Select a model to enter the Hugging Face model list page.

Dify supports accessing models on Hugging Face in two ways:

Hosted Inference API. This method uses the model officially deployed by Hugging Face. No fee is required. But the downside is that only a small number of models support this approach.
Inference Endpoint. This method uses resources such as AWS accessed by the Hugging Face to deploy the model and requires payment.

Models that access the Hosted Inference API

1 Select a model

Hosted inference API is supported only when there is an area containing Hosted inference API on the right side of the model details page. As shown in the figure below:

On the model details page, you can get the name of the model.

2 Using access models in Dify

Select Hosted Inference API for Endpoint Type in Settings > Model Provider > Hugging Face > Model Type. As shown below:

API Token is the API Key set at the beginning of the article. The model name is the model name obtained in the previous step.

Method 2: Inference Endpoint

1 Select the model to deploy

Inference Endpoint is only supported for models with the Inference Endpoints option under the Deploy button on the right side of the model details page. As shown below:

2 Deployment model

Click the Deploy button for the model and select the Inference Endpoint option. If you have not bound a bank card before, you will need to bind the card. Just follow the process. After binding the card, the following interface will appear: modify the configuration according to the requirements, and click Create Endpoint in the lower left corner to create an Inference Endpoint.

After the model is deployed, you can see the Endpoint URL.

3 Using access models in Dify

Select Inference Endpoints for Endpoint Type in Settings > Model Provider > Hugging face > Model Type. As shown below:

The API Token is the API Key set at the beginning of the article. The name of the Text-Generation model can be arbitrary, but the name of the Embeddings model needs to be consistent with Hugging Face. The Endpoint URL is the Endpoint URL obtained after the successful deployment of the model in the previous step.

Note: The "User name / Organization Name" for Embeddings needs to be filled in according to your deployment method on Hugging Face's Inference Endpoints, with either the ''User name'' or the "Organization Name".

Replicate

Dify supports accessing Language models and Embedding models on Replicate. Language models correspond to Dify's reasoning model, and Embedding models correspond to Dify's Embedding model.

Specific steps are as follows:

You need to have a Replicate account (registered address).
Get API Key (get address).
Pick a model. Select the model under Language models and Embedding models .
Add models in Dify's Settings > Model Provider > Replicate.

The API key is the API Key set in step 2. Model Name and Model Version can be found on the model details page:

Xinference

Xorbits inference is a powerful and versatile library designed to serve language, speech recognition, and multimodal models, and can even be used on laptops. It supports various models compatible with GGML, such as chatglm, baichuan, whisper, vicuna, orca, etc. And Dify supports connecting to Xinference deployed large language model inference and embedding capabilities locally.

Deploy Xinference

Before you start

When using Docker to deploy a private model locally, you might need to access the service via the container's IP address instead of 127.0.0.1. This is because 127.0.0.1 or localhost by default points to your host system and not the internal network of the Docker container. To retrieve the IP address of your Docker container, you can follow these steps:

1.First, determine the name or ID of your Docker container. You can list all active containers using the following command:

docker ps

2.Then, use the command below to obtain detailed information about a specific container, including its IP address:

docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' container_name_or_ID

Please note that you usually do not need to manually find the IP address of the Docker container to access the service, because Docker offers a port mapping feature. This allows you to map the container ports to local machine ports, enabling access via your local address. For example, if you used the -p 80:80 parameter when running the container, you can access the service inside the container by visiting http://localhost:80 or http://127.0.0.1:80.

If you do need to use the container's IP address directly, the steps above will assist you in obtaining this information.

Starting Xinference

There are two ways to deploy Xinference, namely local deployment and distributed deployment, here we take local deployment as an example.

1.First, install Xinference via PyPI:

$ pip install "xinference[all]"

2.Start Xinference locally:

$ xinference-local
2023-08-20 19:21:05,265 xinference 10148 INFO Xinference successfully started. Endpoint: http://127.0.0.1:9997
2023-08-20 19:21:05,266 xinference.core.supervisor 10148 INFO Worker 127.0.0.1:37822 has been added successfully
2023-08-20 19:21:05,267 xinference.deploy.worker 10148 INFO Xinference worker successfully started.

Xinference will start a worker locally by default, with the endpoint: http://127.0.0.1:9997, and the default port is 9997. By default, access is limited to the local machine only, but it can be configured with -H 0.0.0.0 to allow access from any non-local client. To modify the host or port, you can refer to xinference's help information: xinference-local --help.

3.Create and deploy the model

Visit http://127.0.0.1:9997, select the model and specification you need to deploy, as shown below:

As different models have different compatibility on different hardware platforms, please refer to Xinference built-in models to ensure the created model supports the current hardware platform.

4.Obtain the model UID

Copy model ID from Running Models page, such as: 2c886330-8849-11ee-9518-43b0b8f40bea

5.After the model is deployed, connect the deployed model in Dify.

In Settings > Model Providers > Xinference, enter:

Model name: vicuna-v1.3
Server URL: http://127.0.0.1:9997
Model UID: 2c886330-8849-11ee-9518-43b0b8f40bea

Click "Save" to use the model in the dify application.

Dify also supports using Xinference builtin models as Embedding models, just select the Embeddings type in the configuration box.

For more information about Xinference, please refer to: Xorbits Inference

OpenLLM

With OpenLLM, you can run inference with any open-source large-language models, deploy to the cloud or on-premises, and build powerful AI apps. And Dify supports connecting to OpenLLM deployed large language model's inference capabilities locally.

Deploy OpenLLM Model

Before you start

When using Docker to deploy a private model locally, you might need to access the service via the container's IP address instead of 127.0.0.1. This is because 127.0.0.1 or localhost by default points to your host system and not the internal network of the Docker container. To retrieve the IP address of your Docker container, you can follow these steps:

1.First, determine the name or ID of your Docker container. You can list all active containers using the following command:

docker ps

2.Then, use the command below to obtain detailed information about a specific container, including its IP address:

docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' container_name_or_ID

Please note that you usually do not need to manually find the IP address of the Docker container to access the service, because Docker offers a port mapping feature. This allows you to map the container ports to local machine ports, enabling access via your local address. For example, if you used the -p 80:80 parameter when running the container, you can access the service inside the container by visiting http://localhost:80 or http://127.0.0.1:80.

If you do need to use the container's IP address directly, the steps above will assist you in obtaining this information.

Starting OpenLLM

Each OpenLLM Server can deploy one model, and you can deploy it in the following way:

1.First, install OpenLLM through PyPI:

$ pip install openllm

2.Locally deploy and start the OpenLLM model:

$ openllm start opt --model_id facebook/opt-125m -p 3333
2023-08-20T23:49:59+0800 [INFO] [cli] Prometheus metrics for HTTP BentoServer from "_service:svc" can be accessed at http://localhost:3333/metrics.
2023-08-20T23:50:00+0800 [INFO] [cli] Starting production HTTP BentoServer from "_service:svc" listening on http://0.0.0.0:3333 (Press CTRL+C to quit)

After OpenLLM starts, it provides API access service for the local port 3333, the endpoint being: http://127.0.0.1:3333. Since the default 3000 port conflicts with Dify's WEB service, the port is changed to 3333 here. If you need to modify the host or port, you can view the help information for starting OpenLLM: openllm start opt --model_id facebook/opt-125m -h.

Note: Using the facebook/opt-125m model here is only for demonstration, and the effect may not be good. Please choose the appropriate model according to the actual situation. For more models, please refer to: Supported Model List.

3.After the model is deployed, use the connected model in Dify.

Fill in under Settings > Model Providers > OpenLLM:

Model Name: facebook/opt-125m
Server URL: http://127.0.0.1:3333

Click "Save" and the model can be used in the application.

This instruction is only for quick connection as an example. For more features and information on using OpenLLM, please refer to: OpenLLM

LocalAI

LocalAI is a drop-in replacement REST API that's compatible with OpenAI API specifications for local inferencing. It allows you to run LLMs (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families that are compatible with the ggml format. Does not require GPU.

Dify allows integration with LocalAI for local deployment of large language model inference and embedding capabilities.

Deploying LocalAI

Before you start

When using Docker to deploy a private model locally, you might need to access the service via the container's IP address instead of 127.0.0.1. This is because 127.0.0.1 or localhost by default points to your host system and not the internal network of the Docker container. To retrieve the IP address of your Docker container, you can follow these steps:

1.First, determine the name or ID of your Docker container. You can list all active containers using the following command:

docker ps

2.Then, use the command below to obtain detailed information about a specific container, including its IP address:

docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' container_name_or_ID

Please note that you usually do not need to manually find the IP address of the Docker container to access the service, because Docker offers a port mapping feature. This allows you to map the container ports to local machine ports, enabling access via your local address. For example, if you used the -p 80:80 parameter when running the container, you can access the service inside the container by visiting http://localhost:80 or http://127.0.0.1:80.

If you do need to use the container's IP address directly, the steps above will assist you in obtaining this information.

Starting LocalAI

You can refer to the official Getting Started guide for deployment, or quickly integrate following the steps below:

(These steps are derived from LocalAI Data query example)

1.First, clone the LocalAI code repository and navigate to the specified directory.

$ git clone https://github.com/go-skynet/LocalAI
$ cd LocalAI/examples/langchain-chroma

2.Download example LLM and Embedding models.

$ wget https://huggingface.co/skeskinen/ggml/resolve/main/all-MiniLM-L6-v2/ggml-model-q4_0.bin -O models/bert
$ wget https://gpt4all.io/models/ggml-gpt4all-j.bin -O models/ggml-gpt4all-j

Here, we choose two smaller models that are compatible across all platforms. ggml-gpt4all-j serves as the default LLM model, and all-MiniLM-L6-v2 serves as the default Embedding model, for quick local deployment.

3.Configure the .env file.

$ mv .env.example .env

NOTE: Ensure that the THREADS variable value in .env doesn't exceed the number of CPU cores on your machine.

4.Start LocalAI.

# start with docker-compose
$ docker-compose up -d --build

# tail the logs & wait until the build completes
$ docker logs -f langchain-chroma-api-1
7:16AM INF Starting LocalAI using 4 threads, with models path: /models
7:16AM INF LocalAI version: v1.24.1 (9cc8d9086580bd2a96f5c96a6b873242879c70bc)

The LocalAI request API endpoint will be available at http://127.0.0.1:8080.

And it provides two models, namely:

LLM Model: ggml-gpt4all-j

External access name: gpt-3.5-turbo (This name is customizable and can be configured in models/gpt-3.5-turbo.yaml).

Embedding Model: all-MiniLM-L6-v2

External access name: text-embedding-ada-002 (This name is customizable and can be configured in models/embeddings.yaml).

5.Ilntegrate the models into Dify.

Go to Settings > Model Providers > LocalAI and fill in:

Model 1: ggml-gpt4all-j

Model Type: Text Generation

Model Name: gpt-3.5-turbo

Server URL: http://127.0.0.1:8080

If Dify is deployed via docker, fill in the host domain: http://<your-LocalAI-endpoint-domain>:8080, which can be a LAN IP address, like: http://192.168.1.100:8080

Click "Save" to use the model in the application.

Model 2: all-MiniLM-L6-v2

Model Type: Embeddings

Model Name: text-embedding-ada-002

Server URL: http://127.0.0.1:8080

If Dify is deployed via docker, fill in the host domain: http://<your-LocalAI-endpoint-domain>:8080, which can be a LAN IP address, like: http://192.168.1.100:8080

Click "Save" to use the model in the application.

For more information about LocalAI, please refer to: https://github.com/go-skynet/LocalAI

Model Configuration

Hugging Face

​

Replicate

​