Hi guys, welcome to OpenIO Storage Talk. I am Enrico Signoretti and in this video I want to talk
about the different types of storage and compare their main characteristics
to understand better the use cases and the kind of application they are best suited for. Let's start with Block storage. This kind of storage system is usually
located close to the server, in the same data center, and it organizes data in separate volumes
accessed by one or very few servers simultaneously. The volumes have the layout of local hard disks
presented in the form of sectors and tracks. Most common access protocols are FC and iSCSI and all of the communication happens
on a dedicated Storage Area Network based on lossless Ethernet or FC equipment. This type of storage is very good
for databases and virtual machines, and more in general for all those workloads
that require low-latency and high iops. Unfortunately block storage systems
are also characterized by a very high cost-per-gigabyte and most of the implementations in the market
still rely on scale-up dual controller designs with a total capacity that can
hardly reach the petabyte level while maintaining a good performance consistency. The limited scalability drives up complexity
and cost in large capacity scenarios. In the last few years all-flash
arrays have become more common and now they count for a good part
of primary storage sales. Flash memory is still 10 times more expensive
than hard drives though. But again, the goal remains absolute performance here, and thanks to its very high speed
and the kind of data stored in this system it is possible to implement advanced techniques
to reduce the data footprint including data deduplication, compression,
and thin provisioning mechanisms which are of help reducing the cost
to a reasonable level. The second option is file-based based storage or NAS. This type of storage system is connected
to the Local Area Network accessed by servers as well as other types of clients
such as PCs. Data is organized in a hierarchy of files and directories,
shared through protocols like SMB and NFS. Because of the additional file system and the network players performance figures are not as consistent
as for block storage but this kind of storage systems
can easily reach very high throughput making them good for a large set of use cases
and all non latency-driven workloads where scalability is not the key. In fact most of the system available in the market
are still scale-up and only few of them shows scale-out capabilities,
due to architectural limitations and the file systems. The dollar-per-gigabyte of NAS systems
is usually lower than block devices also because they are often configured
in a hybrid configuration. A limited amount of flash memory is aimed at serving
frequently accessed data and metadata quickly backed by a large pool of hard disks. Caching mechanism and automated tiering functionalities are in charge of data movements
between different storage tiers. Multi-petabyte installation
are now more common than in the past but the cost of these systems is still high, and they can
become complex to manage when capacity increases. Object storage is a totally different story. This kind of storage is designed to offer the best accessibility and reliability at scale making possible to connect
any type of device from anywhere from any sort of network connection
which can support protocols like HTTP. Depending on the implementation,
also access parallelization becomes a key characteristic allowing millions of devices
to access the same information simultaneously. It is not a case that sometimes object storages
are referred as the storage backbone of the cloud. Data is stored in the form of an object which is made of data itself, usually a standard file,
associated with rich metadata fields and unique key. Contrary to what happens in file system hierarchy,
objects are stored in a flat namespace and can be retrieved by searching metadata
or knowing the key. This is also why object storage is considered
a very good option for storing large sets of unstructured data. The most common protocol
to access an object store is S3 which is also the name of a public cloud
service provided by Amazon AWS. Even though S3 is not as common
as file or block interfaces the number of solutions supporting it
is increasing by the day providing a seamless access
to the object store for legacy applications by adding a gateway for example
or directly from the applications themselves. Because of its scale-out nature, the access protocol involved, and potential network issues between the storage system
and the devices accessing data latency is not consistent at all,
but throughput could be massive. Many modern object stores
have strong multi-tenant capabilities enabling end-users to consolidate
several types of data and workloads in a single system. This helps to simplify and drive down
the TCO of the entire infrastructure. Common object storage implementations
shows similar characteristics and very high scalability when compared to block and file storage. The dollar-per-gigabyte figure is also very low
and multi-petabyte installations are pretty common. These systems are scale-out, with each single node in the cluster taking care of certain amount of capacity. Flash memory is used for storing metadata only and hybrid or flash configurations are very rare
and aimed at solving very specific use cases. Traditional data footprint reduction techniques like duplication or compression are not really applicable here mostly because it is more and more common that unstructured data already comes in compressed and encrypted format. On the other hand, erasure coding or more recently distributed erasure coding techniques improve overall efficiency and durability when data chunks are dispersed in several nodes or locations. I hope this video gave you an idea
of the differences between storage systems. You can find more information on our website
www.openio.com or you can follow us on Twitter. And see you soon for a new episode of
OpenIO Storage Talk. Bye bye!