How to Use Data Governance for AI/ML Systems

Your organization can use data governance for AI/ML to lay the foundation for innovative data-driven tools.

Image: Gorodenkoff/Adobe Stock

Data governance ensures that data is available, consistent, usable, reliable, and secure. It’s a concept that organizations struggle with, and the stakes are even stronger when big data and systems like artificial intelligence and machine language come into play. Organizations are quickly realizing that AI/ML systems work differently than traditional fixed record systems.

With AI/ML, the goal is not to return a value or status for a single transaction. Instead, an AI/ML system sifts through petabytes of data looking for answers to a query or algorithm that may even seem a bit open-ended. Data is processed in parallel, with data threads being simultaneously fed into the processor. Large amounts of data being processed concurrently and asynchronously can be disposed of by IT in advance to speed up processing.

SEE: Recruitment Kit: Database Engineer (TechRepublic Premium)

This data can come from many different internal and external sources. Each source has its own method of collecting, curating, and storing data, and it may or may not comply with your own organization’s governance standards. Then there are recommendations from the AI ​​itself. Do you trust them? These are just some of the questions companies and their auditors face when focusing on data governance for AI/ML and looking for tools that can help.

How to Use Data Governance for AI/ML Systems

Make sure your data is consistent and accurate

If you integrate data from internal and external transactional systems, the data must be normalized so that it can communicate and blend with data from other sources. Application programming interfaces that are predefined in many systems so that they can exchange data with other systems facilitate this. If there are no APIs available, you can use ETL tools, which transfer data from one system into a format that another system can read.

If you add unstructured data such as photo, video, and sound objects, there are object linking tools that can link and relate these objects to each other. A good example of an object link is a GIS system, which combines photographs, diagrams, and other types of data to provide complete geographic context for a particular environment.

Confirm that your data is usable

We often think of usable data as data that users can access, but it’s more than that. If the data you keep has lost its value because it is obsolete, it should be purged. IT and end users should agree on when data should be purged. This will take the form of data retention policies.

There are also other occasions when AI/ML data needs to be purged. This happens when a data model for AI is changed and the data no longer matches the model.

In an AI/ML governance audit, reviewers will expect to see written policies and procedures for both types of data purges. They will also verify that your data purging practices meet industry standards. There are many data purging tools and utilities in the market.

Make sure your data is reliable

Circumstances are changing: an AI/ML system that once worked quite efficiently may begin to lose its effectiveness. How do you know that? By regularly checking AI/ML results against past performance and what’s happening in the world around you. If the accuracy of your AI/ML system is getting away from you, you need to fix it.

Amazon’s hiring model is a great example. Amazon’s artificial intelligence system concluded that it was best to hire male candidates because the system looked at past hiring practices, and most people hired were men. What the model failed to achieve was more highly qualified women candidates. The AI/ML system had strayed from the truth and instead started to sow hiring biases in the system. From a regulatory perspective, the AI ​​was not compliant.

SEE: Artificial Intelligence Ethics Policy (TechRepublic Premium)

Amazon eventually deimplemented the system – but companies can avoid these errors if they regularly monitor system performance, compare it to past performance, and compare it with what’s happening in the outside world. If the AI/ML model is out of sync, it can be adjusted.

There are AI/ML tools that data scientists use to measure model drift, but the most direct way for professionals to check for drift is to compare AI/ML system performance with historical performance. For example, if you suddenly find that the weather forecast is 30% less accurate, it’s time to check the data and algorithms your AI/ML system is running.

Helen D. Jessen