Cloud platforms in general are a suite of cloud services (or an online toolbox), that form the building blocks for implementing business applications on the cloud. Cloud platforms are offered by cloud providers such as AWS, Azure or Google Cloud Platform (GCP). Generally speaking, a company will have its product(s) and its data platform in the cloud.
A mature data and analytics platform built on a cloud platform is composed of various data ingestion services, data storage services, machine learning pipelines and BI layers.
No, they are not. A modern data platform is a stack of tools designed to facilitate extracting business value from data. A data platform can be developed on a cloud platform. Vendors of cloud platforms have developed a large number of services to facilitate the development of a data platform.
However, it is also possible to build a data platform outside of a cloud platform. Having said that, it’s important to note that most services, especially the open-source offerings, are available to host in a non-cloud data center. For example, one can use Apache Airflow, Kubernetes and Apache Druid to create an on-premise modern data platform.
A major advantage of using cloud platforms to implement your data platform is that cloud services turn managing the cloud infrastructure into an abstract problem. In other words, the difficulty of managing complex infrastructure, like servers and networks, is eliminated. In contrast, managing an on-premise data platform is a laborious task that requires a lot of expertise.
Data platforms come with many advantages to your business:
A single source of truth of your data: In many enterprises, an uncontrollable sprawl of source systems and integrations dilutes the lineage and validity of your data. By centralizing your data in a data platform, all source systems will push their data to a single source, which will improve:
Data governance: This is due to enforcing standardization and data quality checks. Other benefits of data governance include maintaining KPI and data definitions in a universal data catalog. This will help in ending conflicts on the data validity of KPIs, which can be found in reporting dashboards.
Security: This occurs by having a centralized authentication/authorization mechanism required to permit data access. Moreover, a data platform allows managing personal Identifiable Information (PII) information centrally, as well as privacy measures, such as The Right to Be Forgotten (RTBF).
Business velocity: This results from having business users access a single platform instead of manually unifying data from multiple APIs or data lakes (which more often than not become more of data swamps). Data silos can finally be submerged by a single data platform.
More efficiency and less costs: A self-service data platform for all future applications increases efficiency and reduces costs on your business. This makes data platforms necessary as your organization scales because it will mean that new use cases built on top of data will not require custom onboarding or complex setups. As a result, new machine learning use cases or new reporting tools can immediately start adding value.
Adhering to platform and infrastructure best practices and reducing operational costs: By building a single data platform that adheres to best practices, such as having proper CI/CD in place and Infrastructure as Code (IaC) platform management, platform failures will become a rarity. This allows your business to focus on extracting value from data instead of focusing on controlling it.
Centralized monitoring of application health, performance and security: In an on-premise data center, many tools have to be used to monitor the health of the various hardware and software components of your platform. Tooling exists that centralizes this, but with the cloud, it’s all built-in.
Before jumping into creating a data platform, it is essential to first define your business data strategy. Stakeholders need to first clearly determine the answers to questions like:
Defining the answers to those questions will allow you to craft a platform that will keep up with your organization’s needs in the future while adding maximum value. Take sufficient time for this process, and iterate on it.
Next starts the journey to creating a data platform, which consists of:
Note: When composing your cloud infrastructure, there often is a choice between implementing a system yourself or outsourcing a system to the vendor. It is important to consider the tradeoffs between managed solutions that vendors offer and crafting your own solutions. Managed solutions guarantee an easy setup, easy scalability and integration with other systems. On the other hand, crafting your own solution allows for full control over the bill and all configuration options. Operational cost, managing cost, and the required expertise for managing custom solutions should be taken into account when making this choice.
On a high level, each data platform is powered by data storage and compute resources to transform and move data. In reality, however, a data platform consists of many tightly coupled parts. Data storage in a platform, for example, can manifest itself as data warehouses, databases or raw file storage. The aspect of compute (i.e. the hardware running applications) is responsible for tasks that include but are not limited to:
A strong platform enables the business to rapidly access valid data, design effective reporting and extract more value from data through, for example, machine learning. This will result from a well-planned platform strategy, which should be centered around business goals, such as cost reduction, time to market, increased security or any other business KPIs.
From an engineering perspective, a strong platform is (just like a strong software), easily scalable, reliable, secure and maintainable. Any platform strategy should (at least) factor in these considerations. Moreover, a strong platform is built on strong engineering principles. To achieve a strong data platform, Xomnia recommends the following starting points:
Any data platform strategy should consider scalability as a top priority. Data volume will increase exponentially, and this warrants extreme care when designing a platform.
To future proof your data platform, Xomnia recommends:
Any data platform that sits on top of modern infrastructure is just seconds away from tapping into its immense potential through machine learning. Most cloud vendors offer managed machine learning services that will ease the complex machine learning lifecycle of training, evaluating and deploying models.
Given that your platform adheres to the principles of scalability, security (i.e. AI is used responsibly) and governance, machine learning engineers and data scientists can start exploring the potential of your organization’s data.
Components of a platform that are required by a machine learning solution are: