What are MacOS Services?

MacOS Services is a powerful feature that allows users to extend the functionality of their Mac computers by integrating various applications and automating everyday tasks. These services are essentially small programs or scripts that can be accessed from the Services menu or through contextual menus, providing a seamless way to perform specific actions within different applications.

By creating a custom MacOS Service, users can enhance their productivity and streamline their workflow by leveraging AI (in this case Large Language Models). By integrating LLMs into everyday tasks, users can automate actions like translation or summarization, get suggested completions of text, and answer questions about bodies of text.

Creating a MacOS Service

You can create a MacOS Service by using Automator. When Automator has launched, choose "Quick Action" (previously labeled "Service"). If the quick start menu does not appear on launch for you, you can go to File → New (CMD+N).

When Automator is launched, it will allow you to create a new document. You can choose "Quick Action" from the menu to start creating your custom service.

Once you reach the editing interface, scroll down the list of options in the left panel to find the Run Shell Script option and drag it into the right panel. We'll configure our workflow to receive text from any application. We want our output to have the ability to replace the selected text so we check the "Output replaces selected text" box and then choose a representative image and color.

You can configure the shell script however you like, but I've chosen to use ZSH and pass the input to stdin. Then I pointed the script to run a JavaScript file which has been changed to be executable (chmod +x) and specifies to use node as the runtime environment.

Configuration for your custom text service.

Note: Shell scripts triggered in Automator Workflows do not get started with sources such as .shrc, .zshrc, .bash_profile etc. This means the PATH variable may not contain everything you expect. In my case, I use nvm to manage NodeJS which means the top of my assistant.js file needed to account for this: #!/usr/bin/env /Users/{user}/.nvm/versions/node/{node_version}/bin/node instead of the more typical #!/usr/bin/env node.

After saving your service (File → Save or CMD+S), you can also make it accessible via a keyboard shortcut by going to Settings → Keyboard → Keyboard Shortcuts and selecting Services → Text → {Your Workflow}.

Setting up a keyboard shortcut for the new "Assistant" service / workflow.

Integrating with an LLM

At this point the remaining work is to mainly just to call out to the desired LLM API. At the time of writing, a popular choice would be utilizing OpenAI's LLMs. If you want to keep the entire workflow running locally, you can use a tool like Ollama which can run many available models such as Gemma, Llama 2, Mistral, or Phi-2. Ollama also provides compatibility with OpenAI's API by simply swapping out the base URL and no longer sending the API key.

For the most basic implementation of integrating with an LLM, we can use the following script:

#!/usr/bin/env /Users/{user}/.nvm/versions/node/{node_version}/bin/node
const fs = require('fs');

const url = 'https://api.openai.com/v1/chat/completions';
const model = 'gpt-3.5-turbo';

if (require.main === module) { // this is the main file being run
  const input = fs.readFileSync(process.stdin.fd).toString(); // read in the text selected by the user
  run(input).then(console.log); // process the text and then print the result to the shell
}

async function run(input) {
  const messages = [
    {
      role: 'system',
      content: 'Continue the text where the user left off.',
    }, {
      role: 'user',
      content: input,
    }
  ];

  const response = await fetch(url, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': 'Bearer <API_KEY_HERE>',
    },
    body: JSON.stringify({ model, messages }),
  });
  
  if (!response.ok) {
    return input + '\nSomething went wrong'; // prefix our response with the input
  }
  
  const data = await response.json();
  const newText = data.choices[0].message.content;
  
  return input + newText; // since we are replacing the text, prefix our response with the input
}

This script takes in the user's selected text and wraps it in a messages array with instructions to simply continue the text from where it leaves off. We then make an API call out, check that the response is successful and, if so, append the text generated by the LLM.

Using the newly created Assistant service to generate completion text in Apple Notes.

We can easily swap out what action we want to take place by setting up some custom commands that, if found in the selected text, perform different actions. Some examples might be if the selected text ends in !summarize, we should use a bigger model like gpt-3.5-turbo-16k and use a system prompt like "Summarize the user's message." or we could allow for a !translate:<language> command that would use a prompt like "Translate the user message into <language>.".

You could also combine more tooling since ultimately this is just a shell script running on your machine. Instead of summarizing the text in place, you could specify an output file, or tell it to add any dates found to your calendar.

Drawbacks

MacOS Services are not without drawbacks. While they can be used across many applications, some do not report the selected text correctly such as VS Code, which seems to limit the amount of characters sent to about 450 regardless of how much is selected.

The selected text is also reported as plain text, which means things like bolding, italics and links get removed before our service has a chance to act on the text. This means that when replacing the selected text, we are removing any existing formatting. If instead you choose for your service to receive rich text, far less apps and fields are available to use the service.

Additionally, assigning a keyboard shortcut can sometimes work but can easily be interfered with by some applications (I could not use the keyboard shortcut I assigned in VS Code or Chrome) leaving you to use the more tedious right click menu or application menu. Even if the text is still selected, these menus do not show the services option if you have moved your focus elsewhere which can be frustrating, as well.

Future work

Beyond adding additional functionality like summarization or translation as mentioned above, there are other opportunities to expand into with services. We can add additional services that could take in some of the other options allowed by Automator, such as PDFs, images or folders.

This could allow for tasks such as summarization of documents, generation of embeddings for future searching, conversion of text to images or images to text or even automatic creation of entire articles or presentations from source material.

There is the possibility of Apple implementing this functionality natively in MacOS, especially as they've begun to emphasize the use of local AI in their latest product launch. However, this does not include all current hardware, such as the M1 or M2 processors, which do not have nearly the same power when it comes to running AI, so it may be a while before Apple is comfortable with the quality and performance available to the average user before directly integrating the functionality.

Technologies