engineering

What is a data platform?

Eric Thanenthiran·11 October 2025·5 min read

A data platform is nothing more than a central hub for all your business data. It includes systems which move the data from various sources into the central repository and the layer which generate outputs from this data (such as reports, dashboards or forecasts). In most cases the data in its raw form is difficult to work with and lacks business context. Data modelling is a process that takes this data and reshapes it, often combining data from various sources. This process establishes a single source of truth for all data and allows businesses to access holistic insights. For example by combining your marketing and sales data, it allows a business to understand the success of specific marketing campaigns.

Why do you need a data platform?

Simply put, the time to invest in a platform is when access to data is limiting your ability to make good business decisions. The data platform exists only to provide the business with data to make informed decisions. To do this with confidence the data must be accurate, timely and complete. High quality insights can give a business nuanced understanding of its current operational and financial health. If this information isn't readily available and impacting the operation of the business, then it's time to invest in a data platform.

Another key driver is to get everyone in a business to agree on definitions of metrics, logic and targets and centralise this source of truth. That means that everyone has a shared understanding of key business information and have a strong foundation to make good decisions.

What makes up a platform?

Regardless of technology choices, all data platforms are made up of the same fundamental components. These can range in complexity and size and the aim of the design phase of any data project is to choose appropriate technologies and systems that match the needs and budget of the business. They can range from simple platforms that run on your local computer to distributed cloud platforms that allow the entire business to access the same insights from anywhere in the world.

Data Platform Schematic

Data Pipelines

Data pipelines are systems that move data from where they currently reside to a centralised data source. These pipelines break open data silos and ingest this data into the data store on a set schedule. Most business reporting can be run on daily pipelines. Real time pipelines can also be useful if the outputs of these pipelines are useful for the business to act on. For example, if some action needs to be triggered by a change in the source system.

Data Store

The data store is the single repository for all your business information. By centralising this data you unlock immense value:

  • Rather than stuck in source systems, you now have total ownership of all this data
  • You are able to join disparate pieces of information together and so gain a holistic understanding of your business (for example bringing together all information you may have on your customers - from your finance, marketing and sales systems)
  • Clean and validate this data, to ensure the data you use to power your business is trustworthy
  • Provide a single source of truth for all your data
  • Accessing this data and insights become much, much easier because it's all centralised
  • You can choose how it is served out to you: as reports, dashboards, websites portals, spreadsheets etc.

Data stores can hold this data as files, in database tables and now even in a hybrid data storage format. As the data store becomes increasingly more valuable, you'll want to ensure that is is hosted in the cloud and is protected through modern security practices.

Within the data store, data modelling (or transformation) occurs to convert the raw data into actionable insights. This process includes data cleaning and the application of custom business logic. This data modelling step requires close coordination with the business and should be agreed across the business.

Orchestration

The orchestration layer can be omitted for simple data platforms but once you start ingesting multiple data sources, an orchestrator will be important to coordinate the operation of data pipelines and the modelling of data. This system coordinates how often your pipelines ingest data, how often this data is processed into insights and how to handle data issues and errors. On more sophisticated data platforms this can also trigger AI models to process, interpret or run forecasts on the latest data.

Access Layer

This is how most of the business will receive and interact with insights. This could be as simple as a spreadsheet report generated on a daily, weekly or monthly cadence or more sophisticated Business Intelligence tools which allow more interaction with the data. Increasingly AI tools allow you to directly query and interact with this data and retrieve insights as you need it.

Semantic Layer

This is a relatively new component of data platforms and give you the ability to attach business concepts and important additional information to your data. It makes it easier for data and AI tools to work effectively with your data. It may not be necessary for smaller platforms, however the investment could pay off quite quickly if you are then able to use AI to gain additional insights on your data. It can also help business users understand the meaning behind a particular dataset or value.

How this works in practice is that in addition to the data in your data store, additional information is added to the platform such as business vocabulary, domain specific terminology, business logic and relationships between datasets. All this bridges the complexity of the source data to users' (AI or human) needs as it adds in a layer of governance and documentation to the raw data.

Get in touch

There's something genuinely exciting about helping organisations unlock insights from their data. Regardless of company size, if you're wrestling with data challenges or just curious about what's possible, we'd love to have a chat about it.

data