The danger of AI built on poor data quality
Technology experts understand that ‘garbage in, garbage out’ is how IT systems work. Despite all the power and sophistication poured into technology, it’s only as good as the data it’s fed. Even the most powerful and sophisticated algorithm will spew out rubbish if it is not supplied with the right inputs. High-quality data is critical to the success of artificial intelligence (AI) and machine learning applications.
Extracting optimal business outcomes and maximising the return on investment (ROI) in AI and machine learning projects starts with an effective data management strategy. Without that, AI and machine learning models will be learning from a flawed foundation and the results will not bring organisations their desired outcomes. An effective data management strategy enables businesses to maximise the return on investment by using all relevant data delivered in a timely manner.
Do data warehouses really deliver?
In the past, access to data relied on complex extraction processes that transformed data and loaded it into centralised data stores called data warehouses or data lakes. That approach never delivered businesses what they wanted because of the cost and complexity. In simple terms, only data that could be extracted, transformed and loaded (ETL) within time and financial constraints was used, limiting the results.
Sitting at the heart of most digital transformation initiatives is AI. Over half of Australian businesses are investing in AI as part of their digital transformation strategy. In order to maximise the ROI, they need to have an effective data management strategy and plan.
The problem is that data is typically scattered across multiple systems, some on-prem and others on cloud services, making it difficult to consolidate to create a data set that can be used by AI models. But rather than identifying all those data sources and then building tools to ETL that data into a centralised data store it’s possible to use a decentralised data platform.
Decentralisation for better management
A decentralised platform looks just like a traditional data warehouse or data lake to applications such as AI and data analysis tools. But the data is never actually copied or moved; the system uses metadata to create a single united view of data from different places that can be queried and used by AI engines. This approach is much faster to deploy than traditional ETL and data pipelining approaches, costs far less and is infinitely more flexible as new data sources can be added without the need for new ETL processes.
This new approach to data management allows you to take a more strategic view of your data. Rather than being limited by a tactical view that is bound by the time and cost of ETL, a networked data platform enables you to use more data sources. The owners of that data manage the governance and rules as well specifying how that data can be accessed.
For technology teams, it means they can spend less time writing code, troubleshooting transformation and loading processes and pay greater attention to business outcomes.
The networked data platform provides a federated view of all the data that can be ingested by AI models. By enabling broader access to more data, AI algorithms become more accurate with important contextual data refining the model. As data sets can be added and removed easily with a networked data platform, models can be quickly refined and improved. As companies observe and evaluate the outcomes from their AI models, they can act to make them better.
Accuracy of the data is increased when applying a networked data platform approach because business rules can be applied at the data source and used to apply data quality processes at query time.
The network effects of this approach to data management enable several benefits to organisations. Easier access to a broader range of data can proliferate the number of AI applications throughout an organisation as access to data is facilitated. And, as data is easier to access there is a flow-on benefit as different business groups can collaborate with greater ease.
Centralising data repositories creates friction that stifles collaboration and innovation. The process of centralising data and moving it involves coding transformation scripts that commonly result in errors being created. A decentralised data platform removes many of the architectural defects that occur with traditional centralised models. A networked data platform is the nervous system for AI.
As businesses look to capitalise on the benefits that AI can bring, they will need to rethink how they access and use data. A decentralised data platform that takes advantage of a virtual data warehouse, where data remains at its source and retains governance and security, enables the development of better AI that supports the ongoing digital transformation.
Eight steps to avoid breaking the bank on AI's seductive promise
Investments into AI that have a positive return are the result of establishing solid foundations...
Is the Australian tech skills gap a myth?
As Australia navigates this shift towards a skills-based economy, addressing the learning gap...
How 'pre-mortem' analysis can support successful IT deployments
As IT projects become more complex, the adoption of pre-mortem analysis should be a standard...