Philip Williams
on 10 November 2022
Object storage is a type of storage where data is manipulated as distinct units. It has accompanied the cloud computing revolution, with S3 (Simple Storage Service) being the very first AWS service. The API for which later turned into the industry standard for the majority of object stores.
Object stores have a very simplistic interface, and do not require you to manage complicated SCSI and HBA drivers, multipathing tools, or volume managers embedded into your operating system. Access to storage becomes an application integration where you point your application at an HTTP endpoint, and use a simple set of verbs to describe what you want to do with a piece of data. Users and applications can be given access to buckets, a bucket being somewhat analogous to a folder. However there is no hierarchical behaviour.
As an example of how these verbs work, do you want to PUT an object somewhere for safekeeping? Do you want to GET an object so that you can do some work with that piece of data? Or do you want to LIST the contents of your bucket? Perhaps these three verbs are an oversimplification of what is possible with object storage, but this is loosely where cloud object storage began. It was an initiative to make storage more economical by removing proprietary technologies and creating a simple scalable storage solution, without the complexities of legacy technologies.
Now that we have a basic understanding of object stores, let’s explore some use cases.
Uses of Object Storage
When building a new application, you will need to build it with object storage in mind. Instead of relying on cluster-aware filesystems and quorum devices, the application will need to handle failover and data consistency itself to remain available during infrastructure failures.
Many off the shelf applications now have native deployment models for working with cloud-native infrastructure, and most importantly with object storage. When your application has finished processing or creating a piece of data, it can be written to an object store for safekeeping, and can easily be retrieved as and when needed.
We can even use object storage buckets to trigger events. Imagine the scenario where you have a mobile app that uploads photos or video, and then some processing happens, before publication. Once a photo or video is uploaded to an object store, an event is triggered to let your backend application know that there is a new object to be processed. And once that object has been processed the output could be written to a bucket that triggers another job to push it to your Content Distribution Network (CDN).
Where can I get Object Storage?
There are lots of options available, all public clouds have object storage offerings. Some of the most well-known are Azure Blob Storage, GCP Cloud Storage, and Amazon AWS S3. Each of these offerings has its own APIs but the most commonly used is the S3 API.
The S3 API has been implemented in other storage solutions, such as Ceph and to a certain extent OpenStack Swift. However, Swift’s implementation is not as feature-complete as Ceph’s and is lacking some features around object lifecycle management and notifications.
Major storage vendors, such as Dell EMC and NetApp, also have solutions, which have largely standardised on the S3 API. Yet, when compared with open source solutions, these remain cumbersome and expensive.
Public or private cloud object storage?
The public cloud might not always be the right choice for all workloads, or for storing all of your data. Despite the fact that the public cloud is instantly accessible, which makes it a great way to get started, over time and as your data set grows, it can become rather cost-inefficient. Public clouds were created around the notion that you can scale up and down on demand, but storage tends to only scale up. Cloud provider costs not only include the charges for storing data, but also retrieval too, and additionally, some providers charge for the number of API operations that you request, and for network transfer costs on top!
A privately hosted Ceph solution can provide significant savings when you have predictable capacity requirements, and you can more effectively manage your own transit costs, either into a public cloud, via products like Direct Connect or ExpressRoute, or at no cost in your own DC or Colo.
Is S3 on Ceph a solution for you?
A Ceph cluster that is compatible with both the AWS S3 API and the OpenStack Swift API can be a cost-effective way to provide object storage to your applications, by combining open-source software with commodity hardware to meet performance, availability and capacity needs.
Learn more about open source Ceph:
Webinar : Reduce your cloud storage costs with cloud adjacent Ceph