Cassandra vs MongoDB vs HBase | Difference Between Popular NoSQL Databases | Edureka

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] big data is revolutionizing the world of IT industry and according to Forbes analyst estimate upward of 80% of the enterprise data is unstructured unstructured data cannot be always handled in real-time if we try to store this data and our DBMS do you think it will be able to scale up the data in real time and give you 100% performance obviously not that is why no sequel databases came into picture to store and handle this data in real time hello everyone this is Neha from adere come and in this video I will be comparing the most prominent no sequel databases Cassandra MongoDB and HBase first let's understand the basics of no sequel database and then dive into the comparison so what is no sequel database it is also known as not only sequel database is an alternative to SQL database which does not require any kind of fixed table schema unlike the SQL it generally skips horizontally and avoids major join operations on the data no sequel database can also be referred to as structured storage which consists of relational database has a subset so now you can only think sequel databases are choose a subset of no sequel databases not only that it also covers a swarm of multitude databases where each of them have different kind of data storage model next what is the need for no sequel databases compared to relational database no sequel are more scalable and provide superior performance fear of the solutions provided by no sequel databases are it can scale out the data easily and has share nothing architecture which is capable of running on a large number of nodes next it also provides a non-locking concurrency control mechanism so that real-time rates will not conflict the writes next it can scale and replicate thousands of machines with distributed data and the architecture of no sequel database provide higher performance per node than a DBMS and has schema less data model so let's see what are the types of no sequel databases there are four different types and our first key value store it has big hash table of keys and values for example Amazon s3 next column based store in this case each storage block contains data from only one column like Cassandra on HP's next document base store it stores the document that are made up of tag elements for example CouchDB or MongoDB next graph based in this case a network database uses edges and knows to represent and store the data for example neo4j so this is all about the basic fundamentals of no sequel databases now let's jump into the core popeck of the discussion and compare the three prominent databases based on these parameters first let me introduce you these terms Cassandra Apache Cassandra is the leading no sequel distributed data management system that drives many of today's modern business applications by offering continuous availability high scalability and performance strong security and operational simplicity by lowering overall cost of ownership next MongoDB it is a document oriented database all the data in MongoDB is traded in JSON format and it is a schema-less database which goes over terabytes of data in the database an Apache HBase is a no sequel key value store which runs in the top of HDFS unlike high HBase operations run in real time on its database rather than the MapReduce jobs next let's see the data model of this databases Cassandra is a white column store model based on the ideas of BigTable and dynamo database it consists of key spaces which is the outermost container in cassandra and column family contains an ordered collection of rows next MongoDB is a document store architecture with data in MongoDB has a flexible schema documents in the collection they don't need to have the same set of fields of structure but common fields in the collections document may hold different types of data next in case of HBase it is partition into tables and tables are further split into column families column families must be declared in the schema and grouped together by certain set of columns that is columns don't require schema definition and HBase works by storing data ask and value now let's see the cap theorem and check where this database is slice cap theorem is a concept that a distributed database system can only have two of the three that is consistency availability and partition tolerance cap theorem is very important in the Big Data world especially when we need to make trade-offs between the three based on our unique use case coming to Cassandra it has a decent less architecture and any node can perform any operation in this case it provides ap from the cap theorem that is it provides availability and partition tolerance from the cap theorem and MongoDB an HBase provides CP from the capture that is it provides consistency and partition tolerance from the cap theorem so this is how all the databases can have only two of the tree from the cap theorem so next parameter on the list is implementation language Cassandra HBase are implemented using one of the most popular object-oriented programming language called Java and MongoDB is implemented using C++ programming language though these databases are implemented using the object-oriented concepts it also provides a wide support to all other programming languages next query language as like sequel cassandra has its own query language called cassandra query language and MongoDB is queried using dynamic object based language and JavaScript and HBase can be queried using MapReduce next when it comes to performance it should be noted that there is no single winner among the top no sequel databases depending on the use case and deployment conditions it is almost always possible for one no sequel database to outperform another and yet lacks is competitor when the rules of engagement change comparing on the benchmarks of performance I would say Cassandra is more durable and perform slight better among three of them next parameter on the list is security again in security all the three knows sequel databases are secure in their own aspects like Cassandra MongoDB and HBase provides client authentication authorization and Cassandra provides SSL encryption and MongoDB provides encryption governance auditing and HBase provides thrift server role as the means of security layers next let's see what are the replication methods Cassandra HBase supports selective replication where you can restrict the amount of information that is exchanged between the replicas and MongoDB supports master-slave replication method where replica sets are recommended for new production deployments to replicate the data in the cluster next competitive advantages for failure handling in cassandra every node contains a replica and in case of failure the replica takes charge so there is no chance of failure and it ensures 100% availability it also offers lowest total cost of ownership and cassandra is the best-in-class scalability and performance of no sequel platforms coming to MongoDB by offering the best of traditional databases as well as the flexibility scale and performance that is required by today's applications MongoDB lets innovators who deploy apps as big as they can possibly dream from startups to enterprises for the modern and mission-critical MongoDB is a database for giant ideas and HBase can store large datasets on top of HDFS and will aggregate and analyze billions of rows present in HBase tables for online analytics operations HBase is used extensively there are wide range of applications of all the three databases I have jotted down few of them Cassandra is used in Internet of Things fraud detection applications recommendation engines product gate locks and messaging applications Twitter and Netflix are the top companies that use Cassandra as the database and MongoDB is also used in Internet of Things mobile single view metlife real time analytics cat locks personalizations etcetera and coming to HBase it is used in medical to store the genome sequence and the history of patient data and for storing match histories for better analytics and prediction in sports and also in order to store user history and preferences web also uses HBase for better customer targeting so these are some of the application areas where all these free databases are used now talking about the last parameter market metrics according to the Forbes estimate around 40% of the Fortune hundred companies are using Cassandra and MongoDB has over 40 million downloads and 7% of the companies in the world are using Apache HBase only so this is where all three databases stand at the top of the list in the market so I hope you understood the similarities and differences between Cassandra MongoDB and HBase depending on your requirements you can choose one among them so that's all for this session thank you and have a nice day I hope you have enjoyed listening to this video please be kind enough to like it and you can comment any of your doubts and queries and we will reply them at the earliest do look out for more videos in our playlist and subscribe to any rekha channel to learn more happy learning
Info
Channel: edureka!
Views: 111,398
Rating: undefined out of 5
Keywords: yt:cc=on, Cassandra vs MongoDB vs HBase, cassandra vs mongodb, Cassandra vs Hbase, MongoDB vs Cassandra, MongoDB vs hbase, mongodb vs hbase vs cassandra, hbase vs cassandra, hbase vs mongodb, hbase vs cassandra vs mongodb, mongodb vs hadoop, NoSQL databases, NoSQL, hbase nosql, cassandra architecture, hbase architecture, cassandra nosql, cassandra overview, cassandra data model, mongodb architecture, mongodb data modeling, mongodb nosql, NoSQL Databases edureka, edureka
Id: QlqylUeqeis
Channel Id: undefined
Length: 9min 33sec (573 seconds)
Published: Wed Oct 17 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.