The cloud refers to platforms that host IT infrastructure for other organizations, companies, etc. By storing and processing its data in the cloud, a company can use servers that are owned and maintained by a ‘cloud vendor’ to handle its data and create all its data products. Each of the major vendors supplying cloud services stores its servers typically in large data centers.
Moreover, large cloud vendors even expanded their services to also allow users to provision tools and software within their platform without having their customers deal with the technicalities of the underlying server.
Xomnia works with the three biggest cloud vendors: Microsoft Azure, Amazon Web Services (AWS) and Google Cloud Platform (GCP). Other niche players and competitors include the clouds of IBM, Oracle, Tencent and Alibaba. Obviously, the latter two offerings have a geographical focus on Asia. Geography plays a role in the size of the market share of these providers. The Dutch market’s share of Microsoft’s Azure offering versus AWS', for example, is much more skewed in favor of Azure, compared to the case in the US market.
Cloud migration refers to IT services and infrastructure that are hosted in data centers owned by a vendor. In other words, a cloud migration means moving IT services and infrastructure from where they are stored originally to the data centers of a cloud vendor. These services and infrastructure could be hosted initially ‘on-premise’ at an organization’s own data center, or could already be hosted at another cloud vendor.
One of the main benefits of using cloud over on-premise data servers is that clouds increase the speed of development, since they provision the necessary infrastructure and services within minutes. This feature is of growing importance, as the amount of data that companies handle continues to grow exponentially. Without ever expanding storage and computational capacities, companies will struggle to maintain their efficiency, speed and competitive edge. On the contrary, clouds can easily or even automatically accommodate extra storage and/or computational power, unlike on-premise architecture where expanding scale requires more (manual) effort.
In the past, companies hosted most of their data and tools in ‘on-premise’ data servers or warehouses, which they created themselves. Storing and working on data and tools on-premise, however, requires many in-house engineers to constantly maintain the IT infrastructure. These engineers also need to worry about purchasing server equipment themselves, and constantly maintaining and updating the on-premise data storage facilities.
Each of the cloud vendors offer similar services to work with data, such as:
Databases / Data Warehouses
Data storage / Data Lakes
Services to run data pipelines and do data transformation
Services to schedule data pipelines
Services to perform data analyses and machine learning
Services to support data governance
There are a number of common strategies for cloud migration. Usually, for different components of the existing platform, different strategies are chosen, leading to an overall hybrid approach of the following strategies:
“Improve and Move”: This is the preferred way of migrating to the cloud. In this migration strategy, you modernize the existing component, taking advantage of cloud-native products. It is an intensive approach, but it also has the greatest benefit - better performance, cost, features and maintainability. But, we want to stress again that it is intensive, which means that you need knowledge of the cloud-native products, strategies and architecture, and to implement all of them together.
“Lift and Shift”: This is the less intensive way of migrating to the cloud. In this strategy, components are moved into the cloud with only minor modifications. Many companies follow this strategy, and the clouds provide Platform as Service (PaaS) solutions to support them. This approach requires a lot less cloud knowledge and architecture than the “Improve and Move” approach, and implementation is relatively fast. But while you benefit from the scale benefits of the cloud (cost-wise), you do not benefit from modern, cloud-native components.
“Remove and Replace”: This strategy takes full advantage of cloud benefits, but also requires the most knowledge and work. It is a full reboot - the current platform is simply replaced by a novel, cloud-native platform.
A mix of all the above: An example of mixing the strategies can be considered. For example, one can “Lift and Shift '' several ETLs, now writing their data to cloud storage. However, the existing on-premise data server is retired (“Remove and Replace”) and a new cloud-native data lake house can be built. However, the A/B testing platform might need minor modifications, and therefore should be moved into a serverless cloud function setup (“Lift and Shift”).
Getting solid migration timelines and evolving business requirements: Because of the many dependencies, getting concrete about deadlines is difficult. For instance, a data engineering team cannot move forward without a source connection, whose highest priority is keeping the business rolling - not a migration to a new platform.
Data security: When migrating to a new platform, you should treat security as a first-class citizen. Make sure to block sufficient resources to carry out vulnerability scans, automatic software deployment and comprehensive testing.
Overhead of running two data platforms at the same time: Only after the new data platform is up and running can a migration happen. A cloud migration, however, is a tedious process that might require a long time. After all, it involves migrating the infrastructure, processes and data to the cloud. Moreover, both the old and new platforms need to be running at the same time, since the business might rely on the data in the original platform. In addition to this, data is not allowed to be lost and the ingestion pipelines need to stay up and running. Consequently, a migration will probably demand resources to keep the old platform alive in production, while working and integrating the same sources onto the new one.
Validation of functionality in the new platform: When building a platform on the cloud, it often means that operations will be done differently than in the past; services used in the old platform might be deprecated, outdated or just not best practice anymore. This means that different tools and services have to be used.
Insights into current limitations: When migrating a platform to the cloud, the old platform needs to be evaluated in full. This might be the first time this review is done in quite some time. As a result, limitations in the current data, business processes or strategies might arise. This can radically affect further project planning and business operations.
Consequently, a validation process is crucial to make sure the data in the new platform is identical to that on the old platform. This step is critical, since there is a big chance the users of the old platform know the data in and out, and will not trust the data on the new platform if it is any different from what it used to be.
Scalability: Migrating to the cloud unlocks scalability in a way that an on-premise system simply cannot accommodate. By migrating their work to the cloud, companies can increase their computational capacity 10 times with one click, which, working on-premise, would require ordering new hardware, waiting for it to be delivered, and then connecting it.
Flexibility: Suppose you need a temporary scale increase - for instance, your organization needs a lot more resources for a short period. With an on-premise data center, you have to invest in hardware that stays. However, with a cloud setup, your scale may be increased temporarily and the computational capacity can be downscaled again, either automatically or by changing a few settings. Besides decreasing costs, migration increases flexibility, as you can explore new machine learning or big data analytics more often because they do not require additional setups or investments.
Reduced incident management: With cloud vendors providing high service level agreements (SLAs) on mission-critical infrastructure, incident management (restoring normal operations in the fastest way possible after an interruption with the least possible impact) will become a thing of the past. This guarantees (almost) no downtime of your heavily accessed databases or authentication mechanisms.
There is no one-size-fits-all answer to this question. The best cloud provider for your business is the one that fits its requirements best. Fundamentally, there are three components to be on the lookout for when determining the suitability of a cloud provider with your business:
Price: Xomnia recommends considering different solution architectures and checking and verifying them externally. We recommend making extensive use of the pricing calculators to estimate your cloud bill, which are provided by cloud providers such as AWS, Azure, GCP. Because of the pricing structure, it might be that one cloud is cheaper for your use case. Do not forget when making the calculations to factor in scalability and initial migration costs.
Setup: Which cloud platform offers the most easy migration process for your applications and data? And which allows you to get started quickly and rapidly prototype? The answer to those questions is important in determining the best cloud provider for your business.
Expertise: Ask yourself questions like: What platform is my technical personnel most familiar with? Which one would they like to learn? Can it be easily integrated into existing enterprise software? Is there perhaps a platform where more people work, offering a larger basis of (future) personnel? The answer to those questions is important in determining the best cloud provider for your business.
Use case(s): Certain providers have more mature offerings for specific use cases than others. For example, AWS offers 2 solutions for big data platforms (Redshift and Athena), which tie into different use-cases. On the other hand, Azure offers a more integrated Data Lake(house) service (Synapse). GCP, however, offers a high level of integration with various (third-party) services and managed hosting, e.g. Apache Airflow hosted as GCP Composer. This ties back into expertise, since your team will want to tackle the use case in the space they are most familiar with.
There are obviously other considerations to take into account when choosing the best cloud provider, such as the availability of professionals who are experienced in a certain cloud provider, the ease of attracting talent that has experience with a provider, the accessibility and comprehensiveness of documentation on setting up and configuring cloud services, or the possibility of integrating a cloud with the other tools that are used in the organization. A common example for the last is the Microsoft Suite, which ties in nicely with the Azure ecosystem.
Any data platform that sits on top of modern infrastructure is just seconds away from tapping into its immense potential through machine learning. Most cloud vendors offer managed machine learning services that will ease the complex machine learning lifecycle of training, evaluating and deploying models.
Given that your platform adheres to the principles of scalability, security (i.e. AI is used responsibly) and governance, machine learning engineers and data scientists can start exploring the potential of your organization’s data.
Components of a platform that are required by a machine learning solution are: