OpenAI language models are now publicly available on Microsoft Azure – and they’re surprisingly easy to use. Artificial intelligence and its subcategory, machine learning, have rapidly moved from the development labs towards our IDEs. Meanwhile, tools like Azure Cognitive Services provide API-based access to pre-trained models.
There are many different approaches to delivering AI services. One of the most promising: is the Generative Pretrained Transformer (GPT) to process large amounts of text. OpenAI has done pioneering work with GPT – and published a fundamental study on the subject in 2018. The model used went through several iterations:
- GPT-2 (unsupervised) used untagged data to mimic humans. It was trained on 40 gigabytes of publicly available data to create a model with 1.5 billion parameters.
- The successor can do much more and creates models with 175 billion parameters: GPT-3 was licensed exclusively to Microsoft and formed the basis for tools such as GitHub Copilot, ChatGPT or DALL-E.
Azure Meets OpenAI
Because a model like GPT-3 requires significant amounts of computing power, on the order of thousands of petaflops/s-day, it is optimally suited for high-performance cloud-based computing on specialized supercomputer hardware. Microsoft has set up its own (Nvidia-based) Azure servers in this area, whose cloud instances are also listed in the Supercomputer Top500 ranking. Azure’s AI servers are based on Nvidia Ampere A12000 Tensor Core GPUs interconnected via a high-speed InfiniBand network.
OpenAI ‘s generative AI tools were trained and developed on the Azure servers. As part of a long-term cooperation between OpenAI and Microsoft, the tools are now publicly available as an Azure OpenAI Service – including support for GPT-3 text generation and the Codex model. According to Microsoft, support for DALL-E will follow in the form of a future update. However, publicly available does not mean freely accessible: Microsoft still restricts access to ensure that projects follow ethical guidelines for AI use and are narrowly limited to specific use cases.
In addition, only direct Microsoft customers get access. Microsoft takes a similar approach when it comes to its cognitive services. The guidelines the group applies will likely remain the same in the future. Access to some areas – such as health services – could be equipped with additional protective measures for regulatory reasons.
Explore Azure OpenAI Studio
Once your account has been approved to use Azure OpenAI, you can start creating code that uses the API endpoints. The corresponding Azure resources can be made via:
- the Azure portal,
- the Azure CLI or
- Arm templates.
If you’re using the Azure portal, create one resource that you want to use for your application and any associated Azure services and infrastructure. Once this is assigned to your account and resource group, you can name it and choose a pricing tier (there is only one at the moment – this may change soon). With that done, you can deploy a model using Azure OpenAI Studio. This is where you will do most of your work with OpenAI. You can currently choose from three model families :
- GPT-3 (Natural Language),
- Codex (Code), and
Within these families, several different models are available. In the case of GPT-3, “Ada” is the least expensive but also the most limited – at the higher end of the scale is “Davinci”. Each of these models is a superset of the other. This has the advantage that you do not have to change the code in the case of more complex tasks but can choose a different model. Microsoft recommends developing OpenAI-powered applications using the best-performing model, as this allows for price and performance fine-tuning when the application goes into production.
Customize GPT-3 Models
Even though GPT-3’s text completion features have gone viral, in practice, your application needs to be much more focused on your specific use case. After all, running a GPT-3 based support service that regularly gives irrelevant advice is outside your interest. Therefore, you should create a custom model using training samples with inputs and desired outputs. In Azure OpenAI, this is called “Completions”. It is essential to have an extensive training data set (Microsoft recommends several hundred examples).
Regarding the training data management, you can store all your prompts and completions in a JSON file. With a finished, customized model, you can now use Azure OpenAI Studio to test how well (or poorly) GPT-3 performs for your use case. A simple playground in the form of a console application provides insights into which completion the model outputs for specific prompts. Microsoft recommends making the prompts as explicit as possible to generate optimal results.
The playground also helps you train your models. Another helpful feature is the ability to set the intent and expected behavior early on. When using OpenAI Services for a help desk triage tool, you can choose to have the output delivered in a polite and calm tone. The same tools can also be used with the Codex model. You can check how this works as a code completion tool or dynamic assistant.
Write Code For Azure OpenAI
Once ready to code, you can use your deployment’s REST endpoints directly or with the OpenAI Python libraries. The latter is the quickest route to live code. For this, you need the following:
- the URL of the endpoint,
- an authentication key and
- the name of your deployment.
Then set the appropriate environment variables for your code. As usual, when it comes to production, the best practice is avoiding hardcoding keys and using tools like Azure Key Vault to manage them. A endpoint calling is very simple:
- Use the opener.completion.create a method to get an answer and set the maximum number of tokens needed to contain your prompt and its response.
- The response object returned by the API contains the text generated by your model. The rest of your code can be extracted, formatted, and used.
- The primary calls are simple, and you have additional parameters that your code can use to manage the response. Said parameters can also ensure that the answers are honest and accurate.
If you use another programming language, use their REST and JSON parsing tools. GitHub also hosts Swagger specifications that you can use to generate API calls and work with the returned data. This approach works well with IDEs like Visual Studio.