Kubeflow Part 3: Running, Managing, and Monitoring Pipelines

professional architect working with draft in office

We are running a #Kubeflow series where we are sharing our experiences and thoughts on building a Kubeflow-based ML pipeline architecture for production use. This is the third post in the series.

In the last couple of articles in this series, we have seen how to create Pipelines in three different ways: through the most common JupyterLab server, an RStudio server, and a drag-and-drop UI built atop JupyterLab. Now that the Pipelines have been created, the next step is running them. In this article, we will cover all the setup required to run a Pipeline.

In addition, we will look at the different ways a Pipeline can be called on to run. Broadly, they fall into the categories of running ad hoc, from the UI; running on schedule, through cron expressions or otherwise; and running on demand based on a call from another job.

Once a Pipeline has been called on to run, we will see how to look at the status of the Pipeline as a whole and of each Component, as they run and succeed or fail. Finally, we will look at the execution details and the logs, where they are stored, and how to view them.

Registering a Pipeline in the UI

If we follow the steps from either of the previous two articles, at this point, we are left with a YAML file that defines the Pipeline in terms that Kubeflow understands. Before the Pipeline can be run, it must be registered with the UI. This has the benefit of allowing users to view the Pipeline in graphical form in the Kubeflow console.

Once in the Kubeflow console, go to the “Pipelines” section and select “New Pipeline.” Give the Pipeline a name and, optionally, a description. Following that, upload the YAML file and create the Pipeline.

This will create a new entry in the table of Pipelines, and clicking on it will display a graph of its flow.

Figure 1. The Use-Case Inference Pipeline in the Kubeflow UI

This is the inference pipeline for the use-case discussed in the previous articles. In this use-case, we attempt to find anomalous logs based on the sequence of logs appearing in the log stream.

The first Component, “Unzip data,” downloads inference data from S3 and unzips it. We preprocess the data in the same way we did during training in the next Component, and perform inference in the third. The “Model inferencing” Component also uploads the results to S3.

In our final inference Component, we write information about our data and model into our monitoring system. This is beyond the scope of this article, but we will cover it in a later one.Note that in the graphical form, each node represents a Component, and each edge represents the outputs of one Component that form the inputs of another. Typically, Kubeflow runs Components in parallel if possible. Since, in this case, each Component is dependent on at least one prior Component, it runs linearly, but this need not be the case in every situation.

Running a Pipeline: From the UI

Running a Pipeline from the UI can be viewed as the “default” method of doing so. Once a Pipeline has been registered with the UI, it is certainly the easiest way to run one.

Kubeflow requires that an Experiment be set up before running any Pipelines. An Experiment is a workspace where you can try different configurations of Pipelines. Experiments are normally used to organize your Runs into logical groups.

So, we head to the “Experiments” section, select “New Experiment,” and give it a name. That will create a new entry in the relevant table for us to use.

A run can be created in one of two ways from the UI, but both end with the same wizard. Heading to the “Runs” section, “Create Run” can be selected.

Alternatively, from the “Pipelines” section, a particular Pipeline can be selected, and from the graphical view, “Create Run” can be selected.

Both end in a wizard. The name and version of the Pipeline can be selected (or are already filled in if the second approach was used). All arguments to the Pipeline function we created in our code are available at the bottom of the wizard.

Figure 2. The Parameters Filled Before the Run

Filling these in allows us to start a Run, which we can view the details of under the “Runs” section (or the relevant Experiment), which we will look at later in the article.

Running a Pipeline: Recurring Runs

Not all Pipelines can be run on a manual basis. For instance, say there is a constant stream of data that is dropped into some blob storage bucket (e.g., Amazon S3). Let’s also assume that the model that consumes that data, only needs to run every half-hour. This could be to save the cost of running inference, and it could also be for reasons relating to the kind of data.

The solution in this scenario is to run the inference Pipeline at 30-minute intervals. This can also be setup through the Kubeflow UI.These kinds of runs are known as Recurring Runs in the Kubeflow UI, and there is a special part of the UI dedicated to setting them up. Head to the “Recurring Runs” section in the console, and click on “New Recurring Run.”

This will lead to a form ready to be populated with information regarding the run that needs to recur.

A name must be given for the Recurring Run, and, optionally, a description. Next, the trigger must be set. There are two triggers available in Kubeflow: a schedule, and a cron trigger. The schedule is a simple trigger that can be set such that the Pipeline runs repetitively in relation to the creation time of the Recurring Run. We can set the unit (minutes, hours, etc.) and the scalar that goes with it. Thus, we can have the Pipeline run every hour since the creation time of the trigger, for example.

Alternatively, the cron trigger takes in a cron expression, which is applied on the UTC time.

The number of runs that can be executed concurrently can also be set. This is useful in case it is anticipated that runs can take a long time to fully execute, and if we need to cap the number of runs executing simultaneously.

The catchup setting, if enabled, backfills jobs in case the Recurring Run was paused in between.

Finally, if required, we can set a start and end time for the Recurring Run, limiting it to running only between those times.

Once the form is filled, it will create a Recurring Run, which should look like this.

Figure 3. A Recurring Run

The Recurring Run displayed in Figure 3 will run our inference Pipeline at the 30th minute of every hour, UTC time. It limits the number of concurrent runs to 2, with no backfilling in case the Recurring Run is disabled in between. It also limits the Recurring Run to execute these runs for a single day.

Running a Pipeline: External Calls

The two ways we’ve already discussed to run a Pipeline require some work from the UI. What if the Pipeline needs to be run contingent on some other Job, which may or may not originate from Kubeflow?

The Kubeflow Pipelines SDK allows users to call a Pipeline run programmatically. First, ensure that the Kubeflow Pipelines SDK is installed on the system that is calling the Pipeline run. We will use the Python requests library to create a session at the Kubeflow host, and use its auth-session cookies to create a kfp client.

import requests
import kfp


USERNAME = "username@example.com"
PASSWORD = "password"
NAMESPACE = "namespace"
HOST = 'https://kubeflow.host.com'


session = requests.Session()
response = session.get(HOST)


headers = {
    "Content-Type": "application/x-www-form-urlencoded",
}


data = {"login": USERNAME, "password": PASSWORD}
session.post(response.url, headers=headers, data=data)
session_cookie = session.cookies.get_dict()["authservice_session"]


client = kfp.Client(
    host=f"{HOST}/pipeline",
    cookies=f"authservice_session={session_cookie}",
)

Code Segment 1. Creating a kfp Client

Note that in the example above, the login details—especially the password—are provided in the code itself. It is much safer to give them as environment variables or through a secrets manager.

Once the client is created, we can use all of the functions of the Client class, including the run_pipeline() function.

client.run_pipeline(
    experiment_id='<experiment_id>',
    job_name='<job_name>',
    params={
        'bucket_zipfile_path': 'test_full.zip',
        'bucket_name': '<bucket_name>',
        'sep': ',',
        'decimal': '.',
        'encoding': 'utf-8'
    },
    pipeline_id='<pipeline_id>'
)

Code Segment 2. Running a Pipeline

We provide the ID of the Experiment in which we want the run to be allocated into in the UI, as well as the ID of the Pipeline itself. Both of these need to be known beforehand, but can be gleaned from the Kubeflow Pipelines screen. We also provide the parameters required by the Pipeline in JSON/dictionary format, with the parameter names being the keys, and their values the dictionary’s values.

Finally, we also provide a job name, as it is a required argument.

All methods of calling a run using the Kubeflow Pipelines SDK are available here, including combinations of whether a full Kubeflow installation is present or not, and whether we are calling the run from inside the cluster or not.

The Status of a Run

Once a run has been submitted for execution, whether through the UI, through a recurring run, or from an external call, it will appear under the “Runs” tab in the console, as well as under the corresponding Experiment, if one was defined for it.

Once a Component is successfully run, its node will have a green-background, white check mark to denote that.

Conclusion

Creating a Pipeline is only useful insofar as it can be run effectively, whether programmatically, on schedule, or on demand. Kubeflow allows Pipelines to be run through any of these methods, and its UI provides access to details such as the input and output artifacts of each Component, and the execution logs as well.

Now that we are able to run, manage, and monitor our Pipelines, we can look at the aspects around deploying a machine learning model. In the next article, we look at hyperparameter tuning, and how AutoML can be achieved within Kubeflow. That will allow us to complete our development of the model and leave us ready to look at it in production.

Following that, we will look into model registries and how to bring them into Kubeflow, and observability over the model and the data.

Leave a Reply

%d bloggers like this: