Disclaimer: I am not associated in any way with the companies mentioned in this post. In fact, this is the first time in seven years that I am testing cloud providers in relation to my own product. All opinions expressed here are my own, based on the practical experience accumulated during those tests in relation to the products developed by Code Artworks Ltd.
I’m pressed for time and often find myself wishing I could eliminate the need for sleep, though that’s physically impossible. I am experimenting with various approaches, see if they work according to my needs then move on to design a PoC. In this mode I have got no time for a deep dive. My expertise in GCP or any other cloud services so far has been configuration of user account, setup of a cloud instance and related HW resources, installation of software and running of remotely deployed services in development environment. I’m confident that many early-stage startups, similar to me, don’t have the resources to hire cloud specialists (commonly referred to as ‘devops’), especially in the pre-seed stage of a company. This article shares my personal experience in setting up and using Google’s generative AI service called Vertex AI. It’s important to note that the market is rapidly evolving, and any flaws, bugs, or deficiencies I detail further down may have been improved or fixed by the time you read this article.
So, we have been working on a new unannounced product which makes use of generative AI models. Initially we began playing around with OpenAI’s GPT chat API just to learn if it answered our application needs. It met our technical expectations. However, at present, OpenAI lacks the B2B infrastructure necessary for deploying their models in commercial-grade workloads. If your application is intended to be used by thousands of people globally, it requires management of servers, services, regions, storage, security, etc. Naturally, our next step was to explore running OpenAI on Azure. Unfortunately, as the feature was not publicly accessible, we had to fill out a form and submit a request to join the program.
During that period, I had the opportunity to attend the Google Cloud summit, where the company introduced Vertex AI. What appealed to me about Vertex was the concept of dedicated models tailored for specific use cases. Instead of relying on a single, all-encompassing ‘Swiss knife’ model, users could opt for smaller ones trained for problem domains like image recognition, text, chat, etc. This approach is designed to deliver faster response times, lower resource consumption, and ultimately reduce cloud usage costs. An additional advantage for startups leveraging AI services under Google Cloud Platform (GCP) is the seamless integration with other Google services, including App and Compute engines, authentication services, and more — all conveniently housed under the same roof. It sounded promising, so I decided to give it a try.
The GCP console has grown noticeably more complex since the last time I used it about seven years ago. However, the initial setup for Vertex AI isn’t as intimidating. To get started, simply log in to the Google Cloud console, click the navigation menu in the top left corner of the screen, then expand ‘more products’ and select Vertex AI under the ‘Artificial Intelligence’ menu.
My issues began at this stage. As a new user, I was following their video tutorial (which has been removed by the time of this writing), and it skipped the crucial step of clicking ‘Enable all recommended APIs’. Fortunately, this oversight has already been addressed by their documentation.
The consequence of not enabling all the recommended APIs is that when setting up a fine-tuning job and clicking the ‘Start tuning’ button, nothing happens. This was particularly frustrating since the console didn’t provide any errors or warning messages. Out of desperation, I even posted a question on their forums (but received no answers). It turned out to be one of those simple oversights that can consume hours of your time to figure out. In the end, I went through the entire process of setting up Vertex API from scratch and realized I hadn’t pressed that big blue button earlier.
Fine-tuning the model
It took me a week to reach the point where I could successfully execute a fine-tuning job. Here’s why: GCP requires users to manually allocate hardware resource quotas for the services they plan to run. It seems they assume that all users have a background in devops. However, as I mentioned, I don’t have one. While I understand the process of configuring an instance for, let’s say, a rendering engine — ensuring it has not only a CPU but also an activated GPU — in this case, I’m not deploying my own product, nor am I a deep learning engineer. I can only guess what kind of hardware I need to activate in order to train a Vertex AI model.
As I mentioned, Google doesn’t establish default hardware quotas; users are responsible for that. Consequently, when setting up a tuning job and initiating the pipeline, it leads to the failure of one of the pipeline nodes, elegantly visualized on an interactive graph in real-time.
By selecting the failed node, you can see an error dump on the right side of the console.
The ‘RESOURCE_EXHAUSTED’ message doesn’t make sense, at least to me, in this context, as there was no resource set in the first place. I wonder why Google doesn’t provide a default quota for resources that are absolutely required to run a fine-tuning job. In my case I had 3 types of resources missing as their default quotas are set to zero:
Google allows users to choose between training on GPUs or TPUs. Since I experimented with both, I had to request quotas for both GPUs and TPUs. A2 CPUs, as I understand, are required in both cases because, as you know, we still need some kind of CPU on computers these days. The challenge is that those of you who have submitted requests for increasing quotas at GCP know that it doesn’t happen immediately. As mentioned earlier, I had to wait for a week until all the quotas were increased by the Google team. This is something I completely fail to understand: why can’t all the essential hardware quotas be allocated to a bare minimum by default?
Quota allocation is a separate issue, and I faced a challenge as I had no idea what the optimal minimum should be to efficiently run the training — where efficiency, for me, means achieving a good balance between speed and costs. Here, I made another mistake that could have been quite painful had I chosen to train on a large dataset.
During the time I conducted these tests, I couldn’t find any documentation explaining how many units of these resources I needed to allocate in quotas. For GPUs, I requested 1 or 2 — I don’t remember exactly. For TPUs, I requested 64 cores because, according to their documentation, TPUs are scaled by a factor of 64. However, I couldn’t find answers to crucial questions: Does fine-tuning need to run on all 64 cores? Can I allocate fewer? What is recommended as the minimum configuration requirement? The outcome of this experiment was a charge of $86 US dollars, in addition to the $400 Google gave me for the trial, for fine-tuning on a dataset consisting of only 200 lines of text, totaling 35 kilobytes of data.
As shown in the pipeline jobs list above, the successful job (green) took almost 7 hours. So, I contacted Google support to understand how this could happen. The answer was, ‘We see you used 400 hours to train the model…’. I couldn’t get reasonable answers, but they gladly refunded my credit card. By then, I was too frustrated to continue with the service but decided to try the fine-tuned model for which I paid $86. Not only did it fail to provide precise answers to the questions I asked (which were part of the training dataset), but the answers I received were nowhere close to the prompts in my datasets. Moreover, the default context had zero impact on the fine-tuned model. Even setting the temperature to zero resulted in answers that completely ignored the context.
Here is an example:
Default context prompt: “Your name is White Ball. You are a rabbit who lives in a fairy forest…”
Testing the model:
The goal was, of course, to utilize Vertex AI via the REST API in a native application. In this regard, as someone quite distant from web development, I had to grapple with OAuth related procedures. You see, all Google APIs require authentication via OAuth, and the official documentation lists several ways to achieve it. Then I realized the default authentication token expires after one hour, necessitating a refresh upon expiration.
To address this, I created a service account in GCP, allowing me to generate a key file containing all the information needed for creating and refreshing the authentication token. This approach also enables extending the authentication lifetime up to 12 hours. However, despite these efforts, I couldn’t find a practical REST-based example that provided step-by-step guidance on setting up these requests. Eventually, during these experiments, I abandoned attempts to make token refreshment happen automatically and opted to refresh it manually via the command line.
Vertex REST API expects Json formatted payload which looks very similar to OpenAI’s Json.
Vertex AI format:
As you can see, the JSON structure is not very different. The primary distinction is that in the OpenAI context, it is another type of message where ‘author’ is ‘system,’ whereas Vertex places the context as a separate node alongside the messages array. It’s worth noting that the context in the Vertex AI training dataset is optional. You can set a default one during fine-tuning job setup.
All of the above might sound like a rant, but trust me, that is not my intention. The goal of this article is to provide constructive feedback from the perspective of a small startup as a potential user. I invested a significant amount of time testing Vertex AI because GCP was the chosen platform for deploying our product. Unfortunately, my current verdict is that Vertex AI is not ready for commercial use. It feels buggy, challenging to set up, and even harder to fine-tune to meet specific requirements. Business-wise, the complexity of the service set-up could potentially discourage customers who just want to ‘get stuff done.’
As a point of comparison, after testing Vertex, I moved on to testing Azure’s OpenAI. Despite never having used Azure before, it took me around 10 minutes to figure out how to set up the service once accepted into their OpenAI trial. No OAuth, no manual hardware resource allocation. I was able to run REST requests a few hours later after adding all the required infrastructure on the client side. Why should it be so complicated for GCP users?
Hopefully, by the time you finish reading this article, Google has already addressed the problems I mentioned here. And if it hasn’t, well, there are alternatives. If this writing helps people working for Google make their products less cumbersome to use, then the time I spent writing it won’t have been wasted.