Openmetadata Deployment on Azure Kubernetes Service Cluster
Openmetadata can be deployed on Azure Kubernetes Service. It however requires certain cloud specific configurations with regards to setting up storage accounts for Airflow which is one of its dependencies.
Prerequisites
Azure Services for Database and Search Engine as Elastic Cloud
It is recommended to use Azure SQL and Elastic Cloud on Azure for Production Deployments.
We support
- Azure SQL (MySQL) engine version 8 or higher
- Azure SQL (PostgreSQL) engine version 12 or higher
- Elastic Cloud (ElasticSearch version 8.11.4)
Once you have the Azure SQL and Elastic Cloud on Azure configured, you can update the environment variables below for OpenMetadata kubernetes deployments to connect with Database and ElasticSearch.
We recommend -
- Azure SQL to be Multi Zone Available and Production Workload Environment
- Elastic Cloud Environment with multiple zones and minimum 2 nodes
Make sure to create database and elastic cloud credentials as Kubernetes Secrets mentioned here.
Also, disable MySQL and ElasticSearch from OpenMetadata Dependencies Helm Charts as mentioned in the FAQs here.
Step 1 - Create a AKS cluster
If you are deploying on a new cluster set the EnableAzureDiskFileCSIDriver=true
to enable container storage interface storage drivers.
For existing cluster it is important to enable the CSI storage drivers
Step 2 - Create a Namespace (optional)
Step 3 - Create Persistent Volumes
OpenMetadata helm chart depends on Airflow and Airflow expects a persistent disk that support ReadWriteMany (the volume can be mounted as read-write by many nodes). The Azure CSI storage drivers we enabled earlier support the provisioning of the disks in ReadWriteMany mode,.
Create the volume claims by applying the manifest.
Step 4 - Change owner and update permission for persistent volumes
Airflow pods run as non-root user and lack write access to our persistent volumes. To fix this we create a job permissions_pod.yaml that runs a pod that mounts volumnes into the persistent volume claim and updates the owner of the mounted folders /airflow-dags and /airflow-logs to user id 5000, which is the default linux user id of Airflow pods.
Start the job by applying the manifest in permissions_pod.yaml.
Step 5 - Add the Helm Openmetadata repo and set-up secrets
Add Helm Repo
Create secrets
It is recommeded to use external database and search for production deplyoments. The following implementation uses external postgresql DB from Azure Database. Any of the popular databases can be used. The default implementation uses mysql.
For production deployments connecting external postgresql database provide external database connection details by settings up appropriate secrets as below to use in manifests.
Step 6 - Install Openmetadata dependencies
The values-dependencies-yaml is used to overwride default values in the official helm chart and must be configured for customizing for use cases. Uncomment the externalDatabase section with meaningful values to connect to external database for production deployments. We set sensitive information like host address, DB name and DB username through the CLI.
We overwrite some of the default values in the official openmetadata-dependencies helm chart with the values-dependencies.yaml to include an external postgresql db. And it's important to turn the mysql.enable flag to false if you are not using the default mysql db. This can be done both through the yaml file or as shown by setting variable values in the helm install command.
For more information on airflow helm chart values, please refer to airflow-helm
It takes a few minutes for all the pods to be correctly set-up and running.
Step 7 - Install Openmetadata
Finally install Openmetadata optionally customizing the values provided in the official chart here using the values.yaml file.
Give it again a few seconds for the pod to get ready. And when its ready, the service can be accessed by forwarding port 8585 of the cluster ip to you local host port.
Troubleshooting Airflow
JSONDecodeError: Unterminated string starting
If you are using Airflow with Azure Blob Storage as PersistentVolume
as explained in Storage class using blobfuse, you may encounter the following error after a few days:
Moreover, the Executor pods would actually be using old files. This behaviour is caused by the recommended config by the mentioned documentation:
Disabling the cache will help here. In this case it won't have any negative impact, since the .py
and .json
files are small enough and not heavily used.
The same configuration without cache:
You can find more information about this error here, and similar discussions here and here.
FAQs
Java Memory Heap Issue
If your openmetadata pods are not in ready state at any point in time and the openmetadata pod logs speaks about the below issue -
This is due to the default JVM Heap Space configuration (1 GiB) being not enough for your workloads. In order to resolve this issue, head over to your custom openmetadata helm values and append the below environment variable
The flag Xmx
specifies the maximum memory allocation pool for a Java virtual machine (JVM), while Xms
specifies the initial memory allocation pool.
Upgrade the helm charts with the above changes using the following command helm upgrade --install openmetadata open-metadata/openmetadata --values <values.yml> --namespace <namespaceName>
. Update this command your values.yml
filename and namespaceName
where you have deployed OpenMetadata in Kubernetes.
PostgreSQL Issue permission denied to create extension "pgcrypto"
If you are facing the below issue with PostgreSQL as Database Backend for OpenMetadata Application,
It seems the Database User does not have sufficient privileges. In order to resolve the above issue, grant usage permissions to the PSQL User.
In the above command, replace <openmetadata_psql_user>
with the sql user used by OpenMetadata Application to connect to PostgreSQL Database.
How to extend and use custom docker images with OpenMetadata Helm Charts ?
Extending OpenMetadata Server Docker Image
1. Create a Dockerfile
based on docker.getcollate.io/openmetadata/server
OpenMetadata helm charts uses official published docker images from DockerHub. A typical scenario will be to install organization certificates for connecting with inhouse systems.
For Example -
where docker.getcollate.io/openmetadata/server:x.y.z
needs to point to the same version of the OpenMetadata server, for example docker.getcollate.io/openmetadata/server:1.3.1
. This image needs to be built and published to the container registry of your choice.
2. Update your openmetadata helm values yaml
The OpenMetadata Application gets installed as part of openmetadata
helm chart. In this step, update the custom helm values using YAML file to point the image created in the previous step. For example, create a helm values file named values.yaml
with the following contents -
3. Install / Upgrade your helm release
Upgrade/Install your openmetadata helm charts with the below single command:
Extending OpenMetadata Ingestion Docker Image
One possible use case where you would need to use a custom image for the ingestion is because you have developed your own custom connectors. You can find a complete working example of this here. After you have your code ready, the steps would be the following:
1. Create a Dockerfile
based on docker.getcollate.io/openmetadata/ingestion
:
For example -
where docker.getcollate.io/openmetadata/ingestion:x.y.z
needs to point to the same version of the OpenMetadata server, for example docker.getcollate.io/openmetadata/ingestion:1.3.1
. This image needs to be built and published to the container registry of your choice.
2. Update the airflow in openmetadata dependencies values YAML
The ingestion containers (which is the one shipping Airflow) gets installed in the openmetadata-dependencies
helm chart. In this step, we use our own custom values YAML file to point to the image we just created on the previous step. You can create a file named values.deps.yaml
with the following contents:
3. Install / Upgrade helm release
Upgrade/Install your openmetadata-dependencies helm charts with the below single command:
How to disable MySQL and ElasticSearch from OpenMetadata Dependencies Helm Charts ?
If you are using MySQL and ElasticSearch externally, you would want to disable the local installation of mysql and elasticsearch while installing OpenMetadata Dependencies Helm Chart. You can disable the MySQL and ElasticSearch Helm Dependencies by setting enabled: false
value for each dependency. Below is the command to set helm values from Helm CLI -
Alternatively, you can create a custom YAML file named values.deps.yaml
to disable installation of MySQL and Elasticsearch .
How to configure external database like PostgreSQL with OpenMetadata Helm Charts ?
OpenMetadata Supports PostgreSQL as one of the Database Dependencies. OpenMetadata Helm Charts by default does not include PostgreSQL as Database Dependencies. In order to configure Helm Charts with External Database like PostgreSQL, follow the below guide to make the helm values change and upgrade / install OpenMetadata helm charts with the same.
Upgrade Airflow Helm Dependencies Helm Charts to connect to External Database like PostgreSQL
We ship airflow-helm as one of OpenMetadata Dependencies with default values to connect to MySQL Database as part of externalDatabase
configurations.
You can find more information on setting the externalDatabase
as part of helm values here.
With OpenMetadata Dependencies Helm Charts, your helm values would look something like below -
For the above code, it is assumed you are creating a kubernetes secret for storing Airflow Database login Credentials. A sample command to create the secret will be kubectl create secret generic airflow-postgresql-secrets --from-literal=airflow-postgresql-password=<password>
.
Upgrade OpenMetadata Helm Charts to connect to External Database like PostgreSQL
Update the openmetadata.config.database.*
helm values for OpenMetadata Application to connect to External Database like PostgreSQL.
With OpenMetadata Helm Charts, your helm values would look something like below -
For the above code, it is assumed you are creating a kubernetes secret for storing OpenMetadata Database login Credentials. A sample command to create the secret will be kubectl create secret generic openmetadata-postgresql-secrets --from-literal=openmetadata-postgresql-password=<password>
.
Once you make the above changes to your helm values, run the below command to install/upgrade helm charts -
How to customize OpenMetadata Dependencies Helm Chart with custom helm values
Our OpenMetadata Dependencies Helm Charts are internally depends on three sub-charts -
- Bitnami MySQL (helm chart version 9.7.2)
- OpenSearch (helm chart version 2.12.2)
- Airflow (helm chart version 8.8.0)
If you are looking to customize the deployments of any of the above dependencies, please refer to the above links for customizations of helm values for further references.
By default, OpenMetadata Dependencies helm chart provides initial generic customization of these helm values in order to get you started quickly. You can refer to the openmetadata-dependencies helm charts default values here.