In-depth: ClickHouse vs PostgreSQL

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
comparing clickhouse to postgres is like comparing grapes to grapefruit it's a bit ridiculous clickhouse was created for a specific purpose and postgres was designed to be flexible and all-purpose so why even compare them because of most companies that use an online analytical processing database like clickhouse originally used an online transaction processing database like postgres and decided to switch this video will help you understand if that switch is also right for you at post hog this is exactly our journey we originally used postgres to store data but as usage grew rapidly it couldn't handle the scale eventually we migrated to clickhouse and solved many of the scaling problems [Music] to start let's compare the database types oltp which postgres is a rose-based database it stores data and objects as rows like a phone book clickhouse is a o l a p which is a column based database it ingests data as columns meaning there is a large number of them the big difference is how each stores data you can continue to use SQL to interface with either of them but query performance differs significantly for example clickhouse is amazing at Aggregates this is great for products such as analytics Finance ad bidding content delivery and log management which have lots of data that doesn't change often on the other hand clickhouse is terrible at mutations or modifying existing data because of this clickhouse is rarely used Alum it's often used with postgres or other rows-based relational databases which handle the non-specialized bits of the product to make this comparison clearer let's go over some simple cases where postgres is the obvious choice then somewhere clickhouse is the obvious choice a situation where postgres wins over clickhouse is if you run a career-based social network and you want to update the employer for Dennis postgres accesses Dennis's row Alters the employer value and then writes the new value clickhouse needs to load every employer go to Dennis's index alter it and write the entire column back into Data a simple case where clickhouse crushes postgres is at that same career-based social network calculating the average age of all your users rapidly postgres needs to incrementally retrieve every entry grab the age add it to the total average across the count and return the value while clickhouse calculates this all in a single read by getting the values in the age column together both these are crude comparisons they ignore caching and under the hood optimizations like parallelizing requests and utilizing specialized merge tree engines but they give you an idea of what each is good at clickhouses optimizations are made possible by their insert now and optimize later philosophy it is constantly merging data in the background to collapse series data into single values to expedite future queries because it doesn't expect mutations aggregate values won't need to be recalculated because clickhouse is an opinionated solution comparisons often go like this clickhouse does X really well but postgres can achieve acts with Y and Z modifications and a and b setbacks it's like comparing a MacBook with a music synthesizing you can make decent music on your Mac but you can't run Excel on your synthesizer now let's dive into some big differences between the two one clickhouse scales better than postgres when evaluating infrastructure resources we typically think about CPUs RAM and storage for postgres Ram and storage matter but CPU has limited benefit this is because postgres is primarily a single process product meanwhile clickhouse is optimizing data in parallel you can scale the power of your clickhouse Instant by improving any of the dimensions including CPU 2. click House Is Not So secret weapon is materialized views they are saved SQL queries that re-execute at runtime to expose in ephemeral table to query from it is a derived independent table generated at a specific point in time for example going back to our career social network we could have a materialized view for the average age in the industry although both postgres and clickhouse have the functionality to build materialized views the difference is that postgres needs to be manually updated clickhouse auto updates thanks to its insert now and optimize later philosophy this enables it to compress data in a materialized view in its idle time the third big difference is how each treats engines postgres technically only operates using a single engine while a new one designed to optimize the update function is being built for clickhouse engines are a core feature you should instruct clickhouse to utilize a special engine depending on your needs and that engine could dramatically optimize results these engines include aggregating merge tree summing merge tree materialize view merge dictionary and more each is used ephemerally to move merge and export data the fourth difference is postgres's sharding problem this has been a big constraint for a long time even now its most used shining solution declarative table partitioning isn't exactly sharding as splitting operates at a table by table level proper sharding requires a foreign data wrapper but this adds considerable overhead for lackluster reward as an example notion took months to implement robust sharding for postgres clickhouse's approach is a more fleshed out upon sharding clickhouse utilizes an umbrella table knowing the location of shards and replicas to do a Federated query across an entire data set this uses Apache zookeeper to manage the shards clickhouse can Shard prematurely and each Shard can parallelize queries to expedite results in summary clickhouse is made to handle lots of aggregate data while postgres is more flexible but doesn't keep up at high volumes clickhouse optimizes data at every layer from inception to storage to caching to return this enables it to boast about 1000x improvements over postgres however clickhouse is rarely used in isolation four day-to-day app needs like updating or reading single lines databases like postgres still have a critical role that's all for this video on clickhouse versus postgres if you have any thoughts or questions make sure to leave them in the comments below make sure to like And subscribe for more videos like this and check out postdog.com blog for the written version of this video and more thanks for watching [Music] foreign [Music]
Info
Channel: PostHog
Views: 599
Rating: undefined out of 5
Keywords: posthog, analytics, product analytics, website stats, events, event tracking, open source
Id: iLXXoDaFoxs
Channel Id: undefined
Length: 7min 33sec (453 seconds)
Published: Wed Jul 12 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.