Generative AI in data analytics - how AI is making it easier to access data

Listen to this blog

Disclaimer

Unless you have been living under a rock that is under another rock buried under 10 feet of dirt, you are aware of AI and the potential that it has to change the world we live in. While you may have ideas of how AI will impact how we work, you need to be a time traveler to predict all the ways it will influence our world. But we can assume that some amount of people, automation and governance will play important roles in the AI future.

AI is already impacting how humans manage and interact with data. We can ask AI to help us turn data into insights. AI can also be our copilot to help us manage the underlying data that supports these insights. It can also operate independently to ensure that the data we rely on for critical decision-making is trustworthy.

For AI to play a greater role in data access and management, humans must remain at the center of the process. This approach means close monitoring and alerting and appropriate training and retraining.

How AI is helping in data consumption and analysis

AI and, more specifically, large language models (LLM) are taking center stage in helping analysts and decision-makers get the data they need in a consumable format to support quick but thorough decision-making. Text to SQL technology reduces the technical barrier between analysts, data, and insights. Analysts and decision-makers no longer need to know SQL to query databases. New LLM models can automatically create SQL queries based on common language. If a sales manager is interested in sales by region and market segment, they can define the parameters with common business terms to pull the needed data.

AI also helps present data in the most consumable ways. AI-powered data visualization copilots are automating the process of building complex charts and graphs. Decision makers no longer need to go back and forth with a data analyst to get info presented in a way that is easy to understand. They can simply ask an AI assistant to create a chart instantly. If it is not exactly right, analysts can instruct the chatbot to tweak the visual, which can be completed in seconds. This advancement gets data formatted in a consumable way very quickly and eliminates the need to learn how to use multiple BI tools and platforms.

How AI is helping in data management and governance

AI has great potential to help fetch data for decision-makers, but without trustworthy data to feed these outputs, AI will just help move bad data around quicker. Luckily, there are also a wide variety of applications for AI in data management, governance, and data quality.

AI is being applied in data governance as a copilot or recommendation engine and is also poised to govern data and improve quality in the future autonomously.

Data Tagging

AI tools are being incorporated into data governance platforms to streamline the process of exposing higher-quality data and making it available to more analysts and decision-makers. Specifically, the technology is becoming an essential tool in managing data catalogs for greater data discovery and governance. For example, AI supports data governance by helping analysts tag sensitive data such as personally identifiable information (PII) data. Based on characteristics of data designated as sensitive in the past, AI can predict which data columns might contain restricted data.

Data Documentation

Helping to classify data and document data assets is another way the AI works with data stewards, analysts, and engineers to make data more easily discovered by data consumers. To help standardize business terminology and concepts, AI can suggest the most appropriate term to describe data in a data glossary. Similarly, AI can help document data assets by suggesting the best way to describe them.

Data Access

A copilot can also play a key role in data access control rules. AI can suggest which users should be authorized based on individual users’ characteristics and profiles and how they match users already authorized. Conversely, AI can also flag individuals where access may not be appropriate. This capability enables more users with the proper authority to leverage the vast amount of enterprise data organizations collect to generate business value.

Data Validation

Helping to ensure data inputs are valid is another way an AI-powered suggestion engine or copilot can support better data governance. Models can learn to identify inputs that may be errors based on what the AI expects to see as an input. For example, if an input is outside a specific range, the field can be flagged, and a suggestion can be made to fix it before the error enters the database. By presenting this option, mistakes can be addressed in real time, avoiding downstream issues.

Strategies for better AI training

AI models are only as good as the data used to train them. When bad data is used to train AI, the noise confuses them, leading to poor performance and erroneous output. This is particularly problematic for Generative AI, which are much more opaque, and the effects of bad data are much harder to identify.

Given this fact, ensuring platforms that supply data to AI models are working with the highest quality data is paramount to producing quality downstream AI models. It is key that data practitioners work closely with AI-assisted processes to teach them to monitor and scrub data correctly and more autonomously.

Move documentation closer to data

As data practitioners tag data, this information is used to produce suggestions for tagging in the future. Ensuring that the right people conduct the process of data tagging and asset documentation will have compounding effects down the road. Practitioners must effectively tag PII data so AI will accurately learn what PII data looks like and flag it in the future. Continuing to teach AI through appropriately approving or denying AI suggestions for documentation also helps AI grow smarter and more effective over time. Incorporating line-of-business managers and professionals who are close to where data is collected and understand its nuance is important to accurately create documentation that reflects the context in which the data is collected.

Granular tagging

Tagging data at a more granular level can also help AI models perform better and produce more precise results. With richer granular metadata, AI has more differentiated data that can support more specific rules. For example, AI can suggest rules that pertain to single columns within a table or tailor rules that apply to particular personas. This enables a more nuanced approach to authorizing access to data, providing greater insights to more decision-makers.

Shift metadata management and governance left

Many data quality issues originate with data ingestion or when data assets are created. Taking a proactive approach through data validation can eliminate issues down the road. The more data quality issues are allowed to fester, the more likely they will taint AI performance throughout your organization, leading to lower competitive performance. The timing of when AI is incorporated into your data governance process can also influence the outcome.

Leveraging AI to support data quality and governance protocols the minute it hits your systems can limit the risk of dirty data degrading your models. By shifting data governance and data quality checks to the left and integrating AI-driven quality checks earlier in your process, many more people will be involved in ensuring the data you are using to train your AI models is of the highest quality. Also, by integrating AI into your data management workflow, people can collaborate with AI to improve quality and governance in real time—no need to step out of your workflow or revisit data quality issues after the fact.

Getting to more autonomous AI

If you have taken adequate steps in integrating AI into you data governance process and trained your AI with clean data, opportunities arise that allow AI to take a more active role in your data governance strategy.

If we train our models well, we will be more confident that they can handle tasks that a data practitioner might perform. AI has the potential to learn to create data lineage automatically or automate proper data governance.

Spot and fix errors

Automatically identifying anomalies in your data and fixing errors is one area where AI can support data quality more autonomously. AI is particularly good at identifying patterns in large data sets and can pinpoint large and small anomalies. Models can predict what data points should be and, with limited human intervention, adjust a data point that does not fit expectations. With proper training, AI can scrub data sets, find and fill in missing values, or correct inaccurate or inconsistent data. AI can also standardize data into standard formats. For example, state abbreviations can be adjusted to the traditional two-letter form, or different address formats can be standardized.

With more sophisticated training, AI can be trusted to create its own data quality rules or create metadata to better organize data. By integrating AI chatbots to work with humans, models can learn rule structures and parameters and create frameworks to govern its own processes. Similarly, AI can create metadata and documentation on its own to build richer context around data, making it more usable. One example is identifying PII data, such as a social security number in unstructured data, and tagging it as a sensitive data point.

These processes can not only save humans a lot of time, but they can reduce risk of sensitive data reaching the wrong hands while making less sensitive data more accessible to decision makers.

Monitoring your models

Even if you have done a great job of training and implementing your AI models to automate your data governance processes, humans must stay involved.

Even if your models are working well now, there is no guarantee that they will continue to perform well in the future. Things change, models drift, and biases can emerge. Mechanisms must be implemented so humans are able to monitor AI for errors and degrading performance. This might include asking an AI model for an output and comparing it to real data to see if the model produced the right answers or what we might expect the model to produce.

Structuring your strategy for optimal AI productivity

Structuring your organization for higher AI health is essential to successful strategies. It is important to put professionals closest to the data and its context in the best position to train data governance models in a central role. When training AI, the more granular the data, the better, so integrating more ways practitioners can provide feedback to models will improve performance.

Aligning line-of-business professionals with IT will be essential to an effective training process. Both IT and business people can work together to improve performance. IT can test models and implement training processes to ensure optimal performance while business leaders continue to integrate feedback into their workflows. This constant training and retraining cycle will reduce risk while improving data accessibility.

As models improve, they will become more precise and capable of building greater context around data sets. With greater precision and context, this data becomes much more valuable in driving decision-making and business strategy. Those with the best strategy and decision-making will retain a competitive advantage in the marketplace.