Jump to content

Using alternative and local models with the ChatGPT / DALL-E workflow


Recommended Posts

Moderator’s note: The ChatGPT / DALL-E workflow offers the ability to change the API end point in the Workflow Environment Variables, enabling the possibility of using local models. This is complex and requires advanced configuration, not something we can officially provide support for. This thread was split from the main one so members of the community can help each other setting up their own specific models.

 


 

Thanks vitor! There are many open source models with performance equivalent to GPT3.5 or better, without the privacy concerns, dependency on an internet connection for each question, or costs. And there are a number of macOS apps that manage them. This unlocks the power of LLMs for everyone. The good news is that most of these tools offer a local API server that is compatible with the OpenAI API. Therefore all one needs to do is change the URI and you can switch from the commercial OpenAI service to a privacy-respecting, open source, and free alternative:

 

https://github.com/nomic-ai/gpt4all — more basic UI, model selected using model key of API

 

https://lmstudio.ai — more advanced UI, uses UI selected model for API requests.

 

Checking the code, the JS can be tweaked to make the URI redirect to localhost: http://localhost:4891/v1/chat/completions — for GPT4All, the model file needs to be specified but for LMStudio the model is chosen in the UI and that is what the API serves. So a feature request is the option to specify the API address so this workflow can run locally if LMStudio or GPT4All (or several others) are installed. Dall-E is a harder deal, as while there are open source models like stable diffusion (and amazing macOS apps like Draw things to use them), i don't know of a tool that offers an API that would be hot-swappable for the OpenAI commercial APIs...

 

Edited by vitor
Add moderator note at the top
Link to comment

As with most things in computing, using local models has tradeoffs. You’ve mentioned some of the positives, but some of the challenges include having to download multi-GB files, requiring better machines, and being harder to set up. None of that is insurmountable, but it is an extra hurdle that can confuse most users. There are so many different knobs and dials, even when using one specific service, that an early explicit goal of this workflow remains to avoid configuration fatigue. We’re aware alternative models exist and are certainly not averse to them, but for this workflow right this moment the operative word is focus.

 

The great news is that everything in the workflow is built on top of the new Text View, which is content agnostic. In other words, as you’ve noticed, there’s nothing tying Alfred to a particular approach and anyone can build their own!

Link to comment

It would be nice to be able to override the https://api.openai.com to local model. There are many servers, that provide the same API, but you can run local models.

As an example, I am running LM Studio with Mistral on my mac, and I was able to modify chatgpt script to replace https://api.openai.com to http://localhost:1234 and can communicate with my local model. 

 

Also would be nice to easily switch between pre-configured prompts/models/urls. 

Link to comment
9 hours ago, outcoldman said:

As an example, I am running LM Studio with Mistral on my mac, and I was able to modify chatgpt script to replace https://api.openai.com to http://localhost:1234 and can communicate with my local model. 

 

Also would be nice to easily switch between pre-configured prompts/models/urls. 

 

Right, I did the same. Vitor's workflow can work for local use with a simple change. I use both GPT4All and LMStudio and as both support the same API they can be swapped out without any changes other than API base URI. But Vitor wants to focus on OpenAI services only, so we either need to make local changes to his workflow, or someone can release a fork of his workflow to add local use?

Link to comment
Posted (edited)
15 hours ago, vitor said:

@iandol @outcoldman @llityslife Please try this version. Instructions are at the bottom of the About, in the Advanced Configuration section. The update is to be considered experimental and things can change, but this method aims to allow you to use the local models you have set up more easily and not worry about the endpoint being overridden on updates, while at the same time not overwhelming other users.

 

Thanks so much, I think this setting is a nice compromise (env variables are hidden away for miost users). I am having problems with the setting though:

 

 

{"error":"Unexpected endpoint or method. (POST //v1/chat/completions)"}

 

 

image.png.56f5c0c44f959e500b625e27980ac786.png

 

The variable seems to be being sent if I add a debug node:

 

 

[13:33:35.534] ChatGPT / DALL-E[Debug] 'what is the elvish scripting language?', {
  chatgpt_api_endpoint = "http://localhost:4891"
  chatgpt_keyword = "chatgpt"
  dalle_image_number = "1"
  dalle_images_folder = "/Users/ian/Desktop/DALL-E"
  dalle_keyword = "dalle"
  dalle_model = "dall-e-2"
  dalle_quality = "standard"
  dalle_style = "vivid"
  dalle_write_metadata = "1"
  gpt_model = "gpt-4"
  init_question = "what is the elvish scripting language?"
  openai_api_key = "sk-xxxxxxxxxxx"
  system_prompt = ""

}

 

 

The problem is the `//` I get the same error if I use curl directly:

 

 

▶︎ curl -s -X POST http://localhost:4891//v1/chat/completions
{"error":"Unexpected endpoint or method. (POST //v1/chat/completions)"}⏎

  

 

There is no `/` at the end of my variable, not sure where the extra `/` is coming from?

Edited by iandol
Link to comment
5 hours ago, iandol said:

There is no `/` at the end of my variable, not sure where the extra `/` is coming from?

 

I’m not seeing it. You’re in the position to find out since you can test with your model. You’ve poked around in the code before so you can look in the same whereabouts and try different things, like console.log the values, remove the / in the code and add it in the variable, using a replace… Until the cause is clear and can be properly addressed.

Link to comment

It seems the script got stuck at line 200 https://github.com/alfredapp/openai-workflow/blob/main/Workflow/chatgpt#L200 and kept returning the first error from the stream.txt so I didn't see any change editing the code. 🤪 I deleted the files in the workflow data folder and it seems to be working now, though sometimes the model response is slow (when it first loads into memory), and there is a stream error.

 

Anyway I can confirm LM Studio + Hermes 7B model works well with your modified script with the caveat that you must NOT append a / to the endpoint, and I don't know why once it errors it cannot recover without manually deleting the files (possibly ⌘↵ would have done this, I didn't try it?).

 

GPT4All fails to work, as it doesn't use a streaming API (stream=false). Non-streaming mode is easier to work with (blocking response is trivial to handle), but your code is optimised for streaming...

Link to comment

Thanks for the update, but I tested using a third-party API to find errors, and I've set the API URL and API key.

 

[09:26:05.918] ChatGPT / DALL-E[Text View] Script with argv '翻译hello' finished
[09:26:05.941] ChatGPT / DALL-E[Text View] {"rerun":0.1,"variables":{"streaming_now":true}}
[09:26:06.027] ChatGPT / DALL-E[Text View] Running with argument '翻译hello'
[09:26:06.119] ChatGPT / DALL-E[Text View] Script with argv '翻译hello' finished
[09:26:06.162] ChatGPT / DALL-E[Text View] {"response":" [Connection Stalled]","footer":"You can ask ChatGPT to continue the answer","behaviour":{"response":"replacelast","scroll":"end"}}

 

Link to comment
59 minutes ago, iandol said:

 

What third-party tool are you testing? It needs to support stream=true to work... I see the same connection stalled error with GPT4All (https://gpt4all.io/index.html), which uses a non-streaming OpenAI API, but LMStudio (https://lmstudio.ai) which supports streaming does work...

Some GPT service providers offer API with favorable prices, which require customized API URL and sk-xxx API to use it. 

Link to comment

But does their API support streaming mode or not? I suspect even with streaming if their API is slow then this will cause connection stalled errors (there is a timeout in the code, if you tweak it perhaps you can recover the error)? I know there are some services that bypass the country limitations (I am in China, so must go through a VPN for example), and this can add latency to the connection also...

Link to comment
22 hours ago, llityslife said:

@iandol I am sure there is no problem with the use. I can use the third-party API normally using https://github.com/chrislemke/ChatFred.

 

Well, your experience tells us there is a problem. Again, there is not one API method, there are several endpoints each with different requirements and functions. At a minimum there are two endpoints (/v1/completions & /v1/chat/completions) and there is a streaming and non-streaming mode. I have said that for local use, two different apps that both "support" OpenAI API, only one works, and the other doesn't (because it needs stream=false). Just because a service says X it does not specify X.y or X.z — what API mode does ChatFred use, is it the same? My point is to help you determine what the problem is, without knowing the problem you have no hope of finding the solution...

Link to comment
  • vitor changed the title to Using local models with ChatGPT / DALL-E workflow

As this is an advanced option which won’t be relevant to most users and can be tricky to set up correctly, I’ve split the conversation into a different thread (this one). Please continue the discussion on local models here. A moderator’s note at the top explains the situation, but the post is otherwise unchanged.

Link to comment
10 hours ago, vitor said:

As this is an advanced option which won’t be relevant to most users and can be tricky to set up correctly, I’ve split the conversation into a different thread (this one). Please continue the discussion on local models here. A moderator’s note at the top explains the situation, but the post is otherwise unchanged.

Now working properly with third party API.

Link to comment
On 3/6/2024 at 9:36 AM, iandol said:

 

Well, your experience tells us there is a problem. Again, there is not one API method, there are several endpoints each with different requirements and functions. At a minimum there are two endpoints (/v1/completions & /v1/chat/completions) and there is a streaming and non-streaming mode. I have said that for local use, two different apps that both "support" OpenAI API, only one works, and the other doesn't (because it needs stream=false). Just because a service says X it does not specify X.y or X.z — what API mode does ChatFred use, is it the same? My point is to help you determine what the problem is, without knowing the problem you have no hope of finding the solution...

The latest version of workflow is working fine. Thanks

Link to comment

Feature Request: you have added the custom API address, which is great. There are services like openrouter which support OpenAI API, with many other models too: https://openrouter.ai/docs#principles — I think Poe is another example. To get these to work you must specify the model (i.e. "openai/gpt-3.5-turbo"). At the moment you hard-code the model values so this will fail. If you allow more flexible model input then servies like openrouter could also be used by this workflow. The simplest is to use an env var to override the model if set (assume this is advanced user only). Otherwise a text entry option in the workflow UI?

Link to comment
14 hours ago, iandol said:

Otherwise a text entry option in the workflow UI?

 

That would add complexity and foot guns for a feature the overwhelming majority will never take advantage of. Most people don’t know (nor should they have to) the exact string of characters representing each model. That configuration is a popover button because it’s what makes the most sense to cover most cases.

 

14 hours ago, iandol said:

use an env var to override the model if set

 

I’m open to the idea of allowing more overrides and even have several ideas in mind on how to do it. But adding hidden variables piecemeal to support an advanced feature for a handful of users isn’t a good way to develop stable software. So please first investigate thoroughly what are the exact customisations that would be beneficial for custom models and then we can make a decision on all of them at once. Some things can go in (like the custom API endpoint) while others probably won’t (e.g. a non-streaming method) but they should be evaluated in bulk so the workflow doesn’t end up like a brittle Frankenstein-type construction. Think in terms of “if there were going to be just one more version, what would be necessary to cover the bases?”

Link to comment
Posted (edited)

I'll have a look. The OpenAI API is pretty simple to be honest, taking their simple guide:

 

https://platform.openai.com/docs/guides/text-generation/chat-completions-api

 

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

 

We have: API address, API key, the model name & the messages as the core components. Messages are obvious, but the address, key and model name are essential and also required for online alternatives like openrouter.ai and local tools like LM Studio. The hard coded model names for OpenAI do not work for any other alternative, so a way to override it is needed. These are definitely "if there was only one more version, what should be included" options... I think having the standard drop-down for models hard coded is great for beginners (your UI is clean and simple), and the env variable as a text field is perfect for more advanced use.

 

There are a bunch of other parameters for fine tuning the model response: temperature, max_tokens, n, top_p etc. — of these I think none are really essential, though if I was forced to pick I'd have temperature (guiding the fidelity vs. creativity of the model responses) and max_tokens (as at least local models have specific token count limits):

 

https://platform.openai.com/docs/api-reference/chat/create

 

These options are certainly very specialist. I agree that stream=off is not worth supporting, as it adds substantial backend complexity for you with minimal gain (while I love GPT4All, I just won't use it with Alfred, and LM Studio, Ollama and others can take its place...)

 

Edited by iandol
typos
Link to comment

I started using the ChatGPT function and it works very well! However, I couldn't help but think of Perplexity in the same form factor. I think it would be an amazing addition and a super nice workflow to have the option of using.

 

I have been using Perplexity a lot more as a "Google on steroids". It works very well and gets you to an answer quickly. I love having the option of expanding on certain queries. 

 

What do you guys/gals think?

Edited by deepbit
Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...