Getting Started with Amazon FSx for NetApp ONTAP - AWS Online Tech Talk

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hi thanks for joining today's session getting started with amazon fsx for netapp ontap my name is david stein i'm the principal business development manager and we're joined today by amit berlikar who's our senior solutions architect who'll be doing a demo of the service at the end for the agenda today we'll first talk about why fully managed file storage in the cloud matters we'll then give an introduction to our new service amazon fsx for netapp ontap we'll then go into a feature overview and then we'll talk about different migration and hybrid workflows with your on-premises environments and then we'll close out with a demo of the service first let's start with the introduction of why fully manage file storage in the cloud file data is everywhere it's rapidly growing and it rarely shrinks a lot of this data is unstructured data in the form of things like office documents media files engineering files stored in various user shares and team or department shares this data additionally can be part of media processing environments database backups content repositories for application data or archive data that needs to be kept around for compliance purposes or company policy and this data can often be spread across multiple storage silos in different locations throughout your environment and customers telling us that managing this on-premises file storage arrays comes with a unique set of challenges storage arrays will hold all of the storage for a large number of applications and needs to be properly sized to meet your storage needs for the next three to five years as data continues to grow so does the growing infrastructure cost including storage hardware storage networking data center cost and aging disk that the data is housed on on-premises storage is also limited to your immediate capacity this not only affects the amount of data that you could store but also affects the amount of compute that can access the data if the storage can't keep up your applications or end users won't be able to either lastly keeping the storage up and running to ensure your end users and customers have continuous access to this data requires a lot of operational overhead to keep running this includes setting up all the data center infrastructure the power the cooling the networking refreshing your hardware and software and replacing failed drives customers that require a shared file system are looking for a simple fast and cost effective cloud storage solution that not only matches the same features and functionality on premises with minimal disruption to your business applications but also provides the added cloud benefit benefits of durability availability scalability and security to accelerate file based application migrations they want it to be simple and easy to integrate and be compatible with their existing environments cloud-based storage options should provide the features and performance that customers expect similar to or better performance than on-premises for their applications and users as well as native integration with things like active directory and meeting data durability needs with the file system that can take snapshots backups and replicate data for disaster recovery as well as things like managing security with encryption at rest and in transit and they want a range of options that maximize performance and cost with ssd and hdd options they want to reduce over provisioning with not having to pay for storage they're not currently using and also eliminate the need to refresh the hardware every three to five years which can be a time consuming task so to help with this we introduced a service back in 2018 called amazon fsx fsx stands for file system variable and it's a feature-rich and highly performant file system that's fully managed by aws using third-party uh file system software we launched in 2018 with amazon fsx for windows file server built on a windows os as well as amazon fsx for luster a high performance linux based file system built on the common open source file system protocol or file system software called luster and then now introducing just this last month in september of 2021 our third addition to the fsx family amazon fsx for netapp ontap built on the popular on-premises netapp uh software called ontap and the idea is that we're going to give you like for light capabilities and performance the software you're already used to using as a service we're here to provide fully managed hardware and software using software that you're already familiar with so you get that like for like features and you don't have to re-architect your workflows or your uh re-architecture workflows or your setup to use it in aws so when you look at things you don't have to manage with amazon fsx on the left hand side you have the hardware column and this is pretty inherent to anything you might be doing in aws just by using native aws services you no longer have to worry about setting up and managing your data centers dealing with power or cooling and networking dealing with procurement and installation of your hardware which includes all of your servers and storage what customers are asking us for is how can you take that to the next level we also don't want to have to install our file software and do firmware and os installation and upgrades we don't want to deal with configuration or release compatibility by doing constant software patches and updates we don't want to have to pay for our licenses separately and manage that as a separate cost and we don't have to manage our own backups or security and things like end-to-end encryption so introducing our new service amazon fsx for netapp ontap built on the fsx platform amazon fsx for netapp ontap offers a fully featured netapp ontap os in aws ontap is a storage operating system built by netapp that's trusted by many enterprise customers and it supports a wide array of enterprise level data management features such as multi-protocol access from linux and windows snapshots and cloning technology for data durability and dev workflows replication as well as storage efficiencies to make sure that you're able to store a lot of data in less disk space than you previously had to but it gives you the simplicity the agility and scalability of an aws service meaning you pay as you go you can spin up and spin down as needed a one-click deployment option and you get things like tiering uh built in and integration with other aws services the use cases that customers have for netapp on tap regardless of where they're running tends to fall into a couple different buckets you've got user share environments which are things like your corporate e-drive or f drive that has your user shares or team or department shares also known as home directories a lot of these are windows based it may be referred to as smd or sift shares we also see a lot of it applications and databases that need shared file storage this could be crm or erp systems vmware environments it could be databases like sql server oracle rac sap we also see various line of business applications in the financial services sector we see things like risk modeling or grid computing in healthcare and life sciences we see things like pax imaging genome sequencing in media and entertainment this could be video editing or rendering manufacturing workflows you have cad files all types of different industries have use cases where you need shared file storage and underneath all of that is data protection and replication so regardless of your use case you need to back up and protect your data and being able to replicate data between netapp arrays or leverage the existing snapshot and backup technology is huge for customers not only within aws but even from on-premises to aws as a backup or disaster recovery target so you no longer have to worry about setting up and managing a secondary data center so from a feature standpoint there's a lot of different benefits to fsx for netapp ontap the benefit primarily is that it's fully managed meaning that within a few clicks in the console you say how much capacity and how much performance do i want and we set all of this up for you so you no longer have to worry about the installation of the software the provisioning of the compute and the storage the ongoing maintenance of that with the patches and the updates you don't have to worry about backups or encryption these are all things that are managed by us this also includes licensing so when you provision an fsx for netapp onset file system all the licensing is included in the cost and you don't have to go procure that from somewhere else this is truly aws first service and that we're using all aws infrastructure but using the popular netapp software and the support all comes through aws as a fully managed platform uh you get to use a variety of aws and netapp tools um so when you provision a file system you could create your volumes and server virtual machines or svms all through the aws console as well as your capacity and your performance profiles and you can use netapp tools to go deeper and manage things like data replication with snapmirror snapshot with snapvault i'm caching with flex cache or cloning with flex clone all of this is available and managed through the netapp toolset uh from a performance perspective we perform really well and we work closely with netapp to ensure that you're going to get similar or better performance as on-premises environments we have multiple gigabytes per second of throughput you have three throughput options per file system you can provision a 512 one gig and two gigabytes per second and you get hundreds of thousands of iops and some millisecond latencies and we have a lot of in-memory caching that really helps enhance performance we also offer automatic tiering to low-cost and elastic storage so for workflows where you have a lot of infrequently accessed storage this could be a very cost effective solution as we're able to intelligently move data down and we'll show more of that in the coming slides and you also get storage efficiencies with things like data deduplication compression compaction and thin provisioning from an accessibility standpoint we offer multi-protocol access so you could use nfs smb or iscsi and were accessible from any aws linux windows or mac os as well as native integration with services in aws like ecs eks ec2 workspaces app stream and you get concurrent multi-protocol access as well so if you're doing workflows where you need both nfs and smb ontap has been doing this for years and you could do that just as well in amazon fsx we're secure and compliant by offering full encryption at rest and in transit and we integrate natively with third-party software and auditing software like anti-virus software and auditing software like varonis or trend micro and we integrate natively with your own active directory environment for identity based authentication and we offer security and compliance with things like iso pci soft compliance and hipaa eligible really anything that you could do on ontap today you can do in amazon with amazon fsx because we have a full version of ontap that you know and expect from using it on premises today but with all the benefits and accessibility from aws from a tiering perspective so we mentioned on the previous slide that if you have infrequently accessed data we could tier it to a colder class of storage called our capacity pool tier and this slide preview is a little bit about how that works when you provision an fsx for netapp file system you provision an ssd tier and this could be up to 192 terabytes in size and all of our sf sorry all of our fsx file systems are multi-az meaning that when you create a file system we automatically set up two file systems and two azs and we replicate the data synchronously and do all the failover and fail back ensuring that you have high availability this tier is optimized for performance and you want to size it to meet the needs of how much frequently accessed data you have at any given time in most customers environments we expect that to be 20 of their total data set also you'll get a capacity pool tier this is a tiering policy that's set at the volume level each volume can have a different tiering policy and based on the policy you choose whether it's snapshot only meaning you only send snapshots the capacity pools here or whether it's none meaning all data lives in ssd and none is migrated down or whether it's automatic with automatic policies data after 31 days is migrated down to the capacity pools here but you could adjust it as low as 2 days and as long as 183 days or you can set your automatic tiering policy to all meaning that all data will automatically go to the capacity pool tier and the ssd tier just needs to be large enough to handle your metadata and some frequently used operations but for most data sets like i mentioned you'll see 20 percent in ssd which means 80 would live in a capacity points here so not taken into consideration data dedupe or any other storage efficiencies if you have a petabyte of data you can provision just shy of a 200 terabyte file system set your auto automatic tiering policies to automatic and then all that data will get migrated down so you would have an estimated 800 terabytes in your capacity pull tier and then 20 or 200 terabytes in your ssd tier at any given point and we'll handle all the data movement bi-directional back and forth so you don't have to um speaking of storage efficiencies um you have a lot of different options to choose from with fsx for netapp ontap these are all things that netapp has been doing for many years and they're very good at including things like snapshots you can leverage existing fsx for netapp snapshots which are very space efficient you don't take up any space on the first snapshot then each additional snapshot is really just taking up space for the changes that you have and it gives you point in time recovery points for your data that can get down to the file and folder level you have data deduplication um which can help dedupe the data and you have compression as well which shrinks the data to reduce even more overall capacity requirements we'll show in the next slide but we expect anywhere from 30 to 65 percent data efficiency savings with data deduplication and compression also all volumes are thin provision by default which helps you grow as you go and prevent over allocating up front you're allowed to take flex clones for getting writable snapshots with zero capacity penalty which are great for test dev workflows only storing the rights and then lastly compaction which puts more data into every form 4kb block before it's written to disk so let's look at pricing we have two examples on this and the example we use is 100 terabyte file storage um we're multi-az by default uh but you just want to show that we're actually storing twice the amount of storage behind the scenes but in this example um if you look at the left-hand side with no storage efficiencies um we're uh using uh in 100 terabyte example we're using 20 terabytes of ssd storage priced at 25 cents per gigabyte per month and then if the other 80 lives and capacity pool tier we have about 80 terabytes that's going to live in the capacity pool tier for just a little over four cents per gigabyte per month and then you have provision throughput so i mentioned earlier you have throughput options of 512 megabytes per second one gig or two gigs per second in this example we're going to choose one gigabyte per second or just over a thousand megabytes per second and that's priced at 1.2 cents per megabyte per second per month you can see that total price is just a little under 10 000 a month the blended rate if you take into consideration the full 100 terabytes comes out to 9.7 cents per gigabyte per month but if you're able to enable data dedupe and compression our ssd storage supports both dedupe and compression in this case we expect 65 savings so you only need to present provision 7 terabytes of storage and for the capacity pool tier which supports data deduplication we expect 30 savings so you only need about 56 terabytes of storage also note that the ssd storage is provisioned the capacity pool tier is used you don't have to provision any storage you're only going to pay for what you use so this could hypothetically grow or shrink and you're only paying for the storage that's in there at any given point uh so in this example on the right hand side with the dedupen compression you can see your blended rate goes from 9.7 cents per gig to 5.4 cents per gig a lot of savings and a really good price point for 100 terabytes of storage if you start adding this for petabytes just throw another zero on there but the blended rate is going to stay relatively about the same but your per month cost would go up all right so now let's dive a little bit into our feature set we have four categories that we're going to cover one is uh getting started in management of the file system we'll talk a little bit about accessibility um we'll go a little deeper on availability durability and data protection and then we'll wrap up with a little bit on security so starting with the high level resources when you provision an fsx file system this is your aws resource this is going in the aws console provisioning the file system by saying how much ssd storage do i want what is my throughput capacity and things like your vspc subnets and networking and your administrative password to the file system and remember the capacity pool tier is not something you have to provision that will get used on a tiering policy that you set at a volume level all the way to the right-hand side once you provision your fsx file system you automatically get a single storage virtual machine that has a single volume attached to it a storage virtual machine is then going to be a virtual file server that serves data on your network within the fsx file system it gives you an ip address for access per svm an administrative password per svm and optionally active directory integration and configuration if you're working with smb shares and then underneath that storage virtual machine you have volumes which are going to be your data containers for your various file folders and directories and this is sorted by name size and the tiering policy so if we look at this as an illustrative example here you've got your fsx file system which is the box here on the left so this is what you create by saying how much capacity and how much throughput do i want what are my networking and security groups and then within that fsx file system you've got multi-protocol you can do nfs smb or iscsi your management layer and then you create the storage virtual machine and then you create this volume down here at the bottom but let's look at an example where you might have multiple svms so by default we'll create one svm for you and here we've got two so you have svm 01 and svm o2 each with their own multi-protocol layer their own volumes within it so in svmo1 for example we just have one volume an svm o2 we have two volumes so your svms again this is all configurable through our console you don't need to use the netapp tools for this and uh you can slice it a couple different ways but your svm can be by organization and then your volumes can be different departments within that um you could have the svms be per department and the volumes be per team customers carpet up several different ways but you've got a lot of flexibility about how you create your fsx uh file level infrastructure and then create storage virtual machines and volumes within that with aws management tools you have access to the aws management console or optionally you could use the cli or api or sdk again this is where you're going to provision your file system with your file system resources with capacity and throughput your svms and your volumes and then you have netapp management tools netapp management tools can use netapp cloud manager for a gui based access or the ontap cli and api as well and here's where you'll use advanced features to do things with a stat vault or snap mirror or flex clones as an example and you can also use this to integrate seamlessly with other netapp arrays that either are fsx based or on-premises from an accessibility standpoint you've already heard me mention that this is multi-protocol so this means we'll support nfs which is linux based file storage smb or sifs which is windows based and then iscsi for block based within aws we'll support any ec2 instance running linux windows or mac and you also have integration with native container services like ecs eks you could use end user compute things like workspaces for virtual desktops amazon app stream or vmware cloud on aws if you need to peer it between dpc's uh you can integrate this with aws transit gateway and then you can access it from on premises using vpn or direct connect and this is going to be typically in use cases where you've got your end user clients for user shares or department shares like laptops or desktops we also have some hybrid environments that we'll show in a couple slides forward let's talk about uh caching as well uh availability and durability um so with multi-ac file system the file system is comprised of two file servers in uh two separate azs each with their own storage and data is synchronously replicated across aziz in an active passive model so with a multi-ac setup if you have a hardware component failure at the file server layer and 1az will automatically repoint your network settings to go to the other availability zone and since it's synchronous level replication all the data that was written to the 1az is going to be available in the other so in this case you would just reroute your compute to go to the second fsx file system we'll bring up the new file server in the primary availability zone and then fail back to that az and if we have a disk failure that's completely transparent to you we'll go ahead and replace that disk on the back end with out any disruption to your services for data protection i mentioned snapshots earlier so snapshots are used by the onsap os to create point in time copies their copy on rank so they don't take up any additional space for the first one only changes thereafter and it allows you to do a self-service system where users can restore files or folders without having to go to administrators to restore backups um this is a great first line of defense um so this is what customers will use to recover near-term files and folders or issues within a couple weeks to a month or so but it does live on the file system itself for long-term backup and compliance we recommend using fsx backups fsx backups are managed by aws you set them up in your console and you can take a daily backup and keep it for a certain period of time um these are going to be entire file system backup and restores they're crash consistent uh and they are incremental so after the first backup where we take a copy and put it within the fsx backup repository each backup after that will be incremental and they're stored uh in multi-az so you could use it to restore to any az within a region and because they live independently of the file system you could use this to keep backups for compliance reasons for several years and then you also have data replication so in addition to snapshots and backups if you need to keep a copy of your data in another region you could use snapveer snapmirror or snapvault to replicate between different ontap file systems and this is also crash consistent and incremental and can be between on-premises uh fsx or sorry on-premises netapp ontap and fsx or between two fsx for netapp ontap file systems from a security standpoint we're fully encrypted at rest and in transit you could uh use our kms management service or bring your own keys and use kerberos to encrypt and transit we integrate natively with amazon vpc security groups which control network traffic as well as aws iam to control access to the file system itself and then from a compliance standpoint we offer compliance for pci iso sock gdpr and iraq compliant as well as hipaa eligible we also offer file and folder level access control and auditing capabilities so you could do the support of auditing and access using policy or integrate with third-party software like varonis we integrate natively with your own active directory so if you're doing any user authentication for windows shares you can have them authenticate through active directory we also support antivirus scanning which uses vscan and then lastly you can monitor and log api calls with aws cloudtrail as well as the ontap audit logging features all right before we get to the demo just a few more slides we're going to talk about migration and hybrid workflows um so this is sort of a decision tree but if you're deciding on how to best migrate to aws and you realize fsx for netapp on tap is exactly what you need maybe as a primary migration or maybe as a burst workload um you can go ahead and start migrating if it's an offline migration you can use the snow family of products with the snow family you'll store it into the snow service uh snowball for example and in which case you'll upload that to s3 and then transfer to amazon fsx for netapp ontap if it's online you have a couple of options if you're migrating from an existing netapp you could use snapmirror it's going to be the easiest way to copy your data over and sync all the changes in fact you could probably use a combination of that with the snow family for bulk data and then snapmirror to sync the most recent changes but if you're not migrating from netapp ontap um but you prefer to use a gui based migration um you can use netapp cloud sync which worth work with a variety of different vendors not just netapp um or you could use something like robocopy or rsync robocopy for windows workloads rsync for linux workloads there's also the netapp xcp illustratively um you could use snap mirror um sorry migration from ontap using stat mirror so in this case you've got an on-premises environment here on the left which is uh migrating to an fsx for netapp ontap um over uh your direct connect or bpm so the benefit here is snap mirror performs quick efficient and incremental transfers of data and there's no need to refactor architect you could use this for dr purposes you could use this for primary data migration lastly we have caching so this was great to consider for hybrid environments um if you don't have cache any on-premises client or workstation will connect to the fsx for netapp ontap using aws direct connect or vpn and this works really well if you have user shares for example and they all are located in central or regionally located office not too far from the aws region you know we will have people for example migrate their boston users to our u.s east region in ohio or virginia and for typical office documents uh the performance works really well over direct connector vpn but occasionally you might be dealing with users that have particularly large files like engineering or cad files or maybe media files like video or pictures or engineering departments and for this case or maybe you have users that are across the globe or spread across the country so in this case you can deploy a cache if you have an existing netapp on-prem you could use netapp flex cache if you don't you could deploy a virtual machine and netapp has another product called global file cache that's licensed separately that you could use to cache data locally and this is great for user shares and smb environments where you don't want to have a netapp at the existing location but you are okay installing a virtual machine that will install a cache of the or the more commonly read files so that you'll get that local access and it's tied back to the fsx file system as your main repository cloud bursting is another use case without cloud bursting you would just have your on-premises netapp file storage attached to compute instances and then going back to on-premises to get that data which could introduce latency but by using flex cache sorry flex cache you could have um fsx for netapp ontap burst from on-premises storage and then your compute instances now have a copy of your netapp data locally in aws so for example if you needed to spin up a bunch of aws compute because you don't have any on-prem to run a certain type of job for your environment you can have this data burst into aws onto fsx netapp ontap spin up as many compute resources as you need to complete the task and then spin it back down when you're done with that i'll bring it to a meet who will now give you a demo service thanks david for the comprehensive overview hi everyone this is amit barucher i'm a senior solutions architect at amazon web services let's roll up our sleeves and see amazon fsx for netapp on tap in action in this demo we will be creating a fsx file system by going over the various configuration options next we will be creating a file share and accessing it from a linux client using nfs protocol then we'll be accessing the same file share using smb protocol from a windows client thereby demonstrating multi-protocol access we will be then concluding the demo with the flex clone operation without further ado let's see this in action you can use the aws management console aws clies are the apis to create the file system in this demo we will be creating the file system using the aws management console there are two ways to create the file system one is the quick create which uses the recommended best practice configurations for this demo we will be using the standard create option because we want to learn about the various configuration options available to create the file system let's give a name to the file system the ssd storage capacity corresponds to the high performance primary tier of the file system this is the ssd storage for your file system nodes this can go all the way up to 192 terabytes for this demo we'll be creating a file system with 20 terabytes of storage capacity the other performance other two performance parameters for the file system are iops and throughput you get three iops per gb so we will be leaving this as default for the throughput capacity you can select 512 megabytes per second one gigabyte per second or 2 gigabytes per second with the performance parameters configured now let's select where we want to deploy the file system like any other aws resource amazon fsx for netapp on tap file system resides in a virtual private cloud select the vpc where you would like to create the file system the file system is a multi-ac file system the preferred subnet corresponds to the subnet where the primary node of the file system resides this is the node from where all the data is served so let's select a private subnet to deploy the primary node similarly let's select a different private subnet for the standby node with the networking configured let's take a look at some of the administrative management options for the file system we also want to access the file system using ontap cli as well as netapp on tab rest apis so let's provide a password to access the file system so this is the on tap file systems password during this demo we will be using this password uh at a later time let's create something called as a storage virtual machine a storage virtual machine is a virtual file server that has its own administrative access and ips to serve data you can think of it from a multi-tenancy perspective of a same file system so if you have a single file system and you have different application teams consuming data from the same file system you can create multiple svms corresponding to each application p let's create a sample virtual storage virtual machine since we will be accessing the data using smb share we have to join this svm to an active directory enter the required configuration options for the active directory note that the net bios name here corresponds to the netbios name of the svm i have also filled in the remaining configuration details for my active directory for the svm to join next we'll also be creating a volume as a part of the file system creation note that you can always create a svm and a volume after the file system is created so let's create a sample volume note that all the volumes that are created are thin provisioned we will also enable storage efficiency for the volume storage efficiencies enabled allow us to take advantage of on taps deduplication compression and compaction thereby allowing us to reduce our tco capacity pools allow you to tear the data from the high performance primary tier to the capacity tier capacity tier is a flexible elastic pool that can grow and shrink as needed this allows you to virtually store unlimited data in your file system there are four different uh capacity pool tiering policies based on what data blocks are intelligently tiered to the capacity proof we will be leaving this to the default auto with that let's go ahead and create the file system note that it takes close to 15 to 20 minutes for the file system to be created this is the linux client from where i'm going to access the nfs share so first i need to create a location where i need to mount the ship so let's go ahead and do that next i need to mount the nfs share at this location so let's go back to the console select the volume that we want to mount and let's take a look at that attach instructions so i'm just going to copy this note that this corresponds to the dns name of the svm demo wall is the volume that we created and let's go ahead and mount it great looks like the mount operation has succeeded to verify write capabilities let's create a file on this file system on the share okay great now let's try to access this nfs share from the windows client let's go ahead and create a share on the fsx file system so first let's login into the fsx file system using the ontap cli this corresponds to the management ip to login into the fsx file system we'll be using the password that we had used when we were creating the file system next we are going to create a shift share on the demo ball volume that we had created earlier notice that the subshare is created we should be able to mount it from a windows client i have logged in into a windows client let's map the network drive the smp dns name corresponds to the file shares domain name let's go ahead and add the server and the sift shape that we had created earlier great this was able to create with the same volume using smb protocol and as we can see we can connect to the file that we had written to from the linux client thereby fsx on tap allows you to have multi protocol access where the same volume can be read and written from both linux windows as well as mac clients now let's go ahead and create a clone of our demo wall using flex clone technology flex clone technology enables referencing the snapshot metadata to create a writable point in time copy of a volume what this means is the new volume the clone volume shares the same data blocks as the parent volume whenever there is an update to the parent volume or the new volume the data blocks get copied on a write operation with that let's go ahead and see this in action let's login into the file system as we can see we have a demo wall that we had created earlier now let's create a flex clone volume of this let's name this as lone wall read write and the parent is demo wall and we want this to be mounted at clone wall let's go back to the linux client and mount the clone volume as we can see we can access the data files from the clone volume and we can also write to the same file one of the applications for flex clone is a cicd pipeline you have your different environments of development test pre-prod broad and you need to quickly copy your database files across these environments by flex cloning volumes or certain files within the same volume you can quickly and efficiently copy data across different environments this concludes the demo of this presentation we hope that you found the presentation and the demo useful and we would like to conclude the presentation with a list of resources to the official documentation page as well as a link to the hands-on workshop for the amazon fsx for netapp on tap file system thank you for your time
Info
Channel: AWS Online Tech Talks
Views: 608
Rating: undefined out of 5
Keywords: NAS cloud file storage, netapp ontap cloud storage, ontap cloud file system
Id: mFN13R6JuUk
Channel Id: undefined
Length: 42min 32sec (2552 seconds)
Published: Thu Oct 28 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.