AWS reInvent2019: How to Build a Digital Bank Using AWS (STP12)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Oh everyone thank you very much for coming it's really great to see so many people here today so we are Monte banker we're gonna talk to you today about how we built a bank in the cloud using open source technology and and then specifically how we actually make card payments work so when you go into a store and you make a purchase how that actually all comes together and combines various technologies to do the things that make them on to work but before we dig into that a little bit about the Manzo's so we are a fully licensed and regulated bank in the UK we have these lovely hot coral cards and you can open an account with us in minutes from your sofa it's all done through our app it's a really really lovely experience so we currently have around three and a half million customers in the UK and we are growing at a rate of around quarter of million customers every single month as well so our goal isn't just just to be a cloud-based Bank though that's kind of a that's just how we operate what we want to do as a company is to make banking better for our customers so that's by providing very rich insights into your incomings and outgoings as a consumer and more than that we want to become the financial control center for your life so that doesn't mean selling you all of the the Manzo's products but it means that when you like necessarily have products from various places so mortgages with different providers and you might have student there the you should be able to eventually see all of that kind of information in one place just to kind of reduce that cognitive load that comes with managing your money so we do all of the the normal things you might expect from a bank so we can do Google pay Apple pay but being a bank that was built in the 21st century we also started with API is in mind so it's amazing to see what people have done with this everything from crafting their own UIs for the bank through to people who've done very very detailed spending analysis and breakdown built on this thing and and we ourselves we one of the first financial institutions to integrate with if this then that the the kind of automation platform so you can do some really interesting stuff with that so we have people who have set up applets that do things like moving money out of savings account and into their spending account when they run 5k with and then conversely doing things like when you go to McDonald's and you spend you move money away from your spending to punish yourself we've done it we've done loads of other stuff as well so my previous company we used to have a femoral developer environments that we're running on AWS and someone at the company built an integration where when you spend money at the coffee shop in the morning it would automatically trigger the build and deploy of your developer environment so it's ready by the time you get to your your desk which was kind of cool and as you can see here so this is a kind of tech based plan if you like it's our API for finding our branches so we have no branches we have no physical presence on the high street everything we do lives on on github so that's that's mando we are here from Wednesday's platform team my name is Chris I'm the team lead from Wednesday and so he will be talking shortly as well he's a senior platform engineer on the team so when you use your your card in in store a like a staggering number of things have to happen in a very very short space of time so these are the kind of the key players in that that sequence I'll talk through them a little bit now and we'll come back at the end to kind of see how they all fit together to make this all work so that the first bit we have here is the the physical side of things so that the bit from the store where you would actually tap your card through to our physical data center presences where we have three of those very very small setups there we then have a direct connect set up into AWS and then we have the color moving into the cloud we run everything on kubernetes so we have a fairly significant micro services architecture so we're running something in the order of 1,500 micro services at the moment many replicas of those so something like 8,000 pods in our kubernetes clusters we make use of cassandra for all of our data storage so everything everything we have lives in there we use a system called Etsy D for distributed locking and coordination among multiple services and we use a couple of different queueing technologies for different use cases across the bank so we've got Kafka and nsq that we make make a lot of use of finally we we kind of keep an eye on all of the things that are going on across the bank so all of the monitoring that we do and then we use Prometheus for that too so we're gonna kind of dig into it those those things and see how they were all set up and we start with the kind of data center side of things so typically when you think of data centers your mind probably drifts into something like this beautifully organized racks with cables that are wonderfully set up now for from one's a little bit different so we have a very very small data sense presence and and the reason that we have that is we integrate the payment providers like MasterCard and faster payments which is like bank to bank transfers in the UK and for those for those systems we need to physically have some fibre delivered to us and plugged into a machine somewhere it doesn't integrate with the cloud so this is this is in reality what our racks look like and these are one of our better racks we've taken the time to color color code or a and B power rails there and and the way that the the kind of the flow works from MasterCard through to our infrastructure is something like this so we will get a message in from MasterCard it will get processed through our physical servers there it will get encrypted and put into a VPN which runs over Direct Connect and that will drop the messages out the other side straight into a pod running in our kubernetes cluster that pod will be responsible for calling other services that run in there to make the decision about whether we can authorize that payment or not and then the flow works similarly on the the way back so we go all the way back through and we'll return to MasterCard and back eventually to your terminal where you've you've made that payment so that's the kind of physical side of things I'm going to hand over to Sahel who talk a little bit more about the the compute that we run in the cloud an AWS hi my name is Sahel I'm one of the platform engineers on the team the rest of the talk were going to how we leverage these technologies to process payments starting with our compute cluster built on top of ec2 and kubernetes so month or adopter kubernetes pretty early on we started in 20 16 when kubernetes were still like wonder and you know there was no cops or there was no elastic kubernetes service eks provided by amazon we had to build a lot of the tooling ourselves and embark on a journey to build our own kubernetes cluster and learn all of the expertise from scratch so we run a single kubernetes cluster in production on top of ec2 with all of our services running a single plus that can actually get you quite far we've been able to deploy over 1500 micro services with thousands of replicas I think Chris mentioned 8000 replicas I checked this morning I think we're closer to 9000 replicas and going strong the constantly engine is our shipping new applications we are constantly scaling up and down as our demand comes in this cluster contains all of our micro services as well as our monitoring stack based on top of Prometheus which gives us a metrics for all of our applications and we also ingest stuff on cloud watch which Amazon provides for us and a bunch of staple workloads like Kafka we write all of our micro services in go for those unaware or we have not used go before go is a great programming language it is statically typed has great Network and concurrency primitives gives you a single binary which you can run on any modern Linux machine if you were here for the previous talk I wholly agreed that you should keep your docker build small and easy to deploy that means that when you have them in your container registry you can pull them down quickly this helps a lot when you need to scale up really really rapidly when you've had an umber stuff traffic goes binaries are all statically typed there's no dependency management after they are built it is really really good it's also really simple to get started as a really easy language to write you can be productive and write a production-ready application just using the standard library in very little time so you may have seen this diagram floating around on Twitter and other social media this is the traffic flow from the Monza app this is actually the real-life traffic flow that I captured on a Sunday afternoon based on users who were actually using the app it's a large subset of our 1500 micro services running in production it actually takes a lot of things to run a bank and you know our aim is to make banking accessible and provided behind a nice and easy to use application you know let us being the bank handle the complexity on the backend a lot of people are quite surprised by this and struggle to understand why we have so many micro services and we how we can run them without issue a large part of that is consistency almost all of these services are written in Ingo built up built on top of libraries that we provide as a platform team using the same sorts of frameworks and and design patterns and they are deployed in the same way onto a platform which is optimized for running these sorts of services so let's dive a little bit deeper in how that works so we've wholly deploys we've wholly optimized our deployment flow and in combination we're working in a single repository our mana reaper and being relatively opinionated about how services are built we can ship really really quickly all of our services use the same set of common libraries and tooling if you're an engineer and you switch between micro services or you switch between teams you'll find the exact same design patterns being used an exact same infrastructure being used that means you can get up to speed really really quickly we use the platform to build the platform and to deploy the platform it all lives within the same infrastructure we've spent a lot of time building internal tooling like a like our system called shipper to help engineers ship their changes safely and easily engine is at monza ship hundreds of times a day you know there are very often times where we redeploy the entire Bank online without customers noticing when we want to ship a critical change or need to roll out a security update there our tooling does all the automated checks to make sure that engineers are shipping code which is safe reliable robust you know they're not introducing bugs in their in their applications not doing things which are like you know not doing things where they're not doing proper error handling and stuff like that making sure that they are handling failures make sure that they have security and mine are doing the right auto ization and authentication all of this is built into a tooling that means that engineers trodden on on a well-defined path it's very hard for them to do something out of the norm so in an environment where you know kubernetes pods are moving around often things are never where you left them how do services find and communicate with each other so in this particular example say you are the transaction service and you want to talk to the account service you don't know where it lives you don't know what IP addresses are it could be constantly changing how do you figure out where this service lives and send it a network request to get some information moreover in a in a platform where a single request can fan out to tens of downstream requests because we believe that services should be communicating rather than sharing data by going into storage directly how do we keep that reliable knowing that the networking can be inherently flaky the answer for us is to lean on the capabilities of envoy which is like a service proxy or service match depending on who you ask and how you to play combined with our own little bit of infrastructure which we call the configuration provider so we have this configuration provider which is responsible for updating all of the Envoy processes what it does is that it has a hook directly into the kubernetes api and what is the kubernetes api for state changes so when you deploy a new part or you create a new deployment the kubernetes api will inform our configuration provider and that information can then propagate to all of the envoys that are running and listening for updates that's what i is showcasing so in the Envoy world service to service RPC calls look like this essentially they go through the Envoy that means a service itself does need to worry about the complexity of finding other other services and where they are they can just assume that it exists all the networking capabilities are handled by envoy this this layer is responsible for things like service discovery and routing retries timeouts circuit braking and observability the network holds a ton of information within your infrastructure with the systems like envoy being able to take that observability to lea and have that consistently pulled into prometheus in a common format allows us to do real nice visualizations about how each service is communicating with each other and you know the request timings and like how many retries I media when things are going wrong it allows us to debug really really easily so we've spoken about our platform for running micro services and synchronous communication but what about the data layer ultimately services do need to store their data somewhat for us we're using Cassandra running outside of kubernetes on top of ec2 so yeah once all of our services are stateless they need to store their data what we do is as a platform team we provide a highly available and durable Cassandra cluster um has anyone used Cassandra before core quite a few hands but not the majority so I'll go a little bit into how Cassandra works so Cassandra is a highly available and scalable database Cassandra nodes joined together to form a ring so essentially your data if you take a database or something your data is spread across all the nodes which are present in the rain there is an a master Cassandra is a masterless system so no one particular node is responsible for coordinating or anything like that all the nodes are responsible and typically what happens is in your application when you talk to Cassandra you will do round-robin load balancing across all the nodes or you might do something a bit more smarter like like a weighted load balancing based on response times latency aware load balancing or whatever strategy works for you so in this particular example the transaction service wants to read some data it goes into like a round robin fashion and picks the green note to get that data from cassandra and that green note knows where that particular piece of data lives so it can go out to those three nodes here we're reading in a local quorum fashion the data is living across three nodes it can go out to the three nodes which are responsible for that data and essentially when the majority agree or have returned their result the result will be returned to the client the client doesn't need to know where that data lives it doesn't need to make any any sort of routing decisions this means that as we add or remove cassandra nodes and the data shuffles around all that is transparent from the application now if you want the fastest response and you know you want to trade off maybe a little bit of consistency or you don't want the most up-to-date view maybe if you're doing something like analytics processing in the background or something like that you can use a replication factor of one where you just query the data and the node which gives it to you the fastest is the one that you know that returns to your application the quickest it might be that that date is a little bit stale because they all the updates may have not propagated but that might be okay for your use case now the query time flexibility allows for a lot of really flexible use cases so having the replication and the quorum mechanism means that when a node dies which does happen on ec2 surprised hopefully no surprise to anyone all you need to restart here or you need to restart the application you can just continue business as usual um we were very routinely take exercises in just restarting the entire database cluster one node at a time and killing nodes when we can to make sure that that resilience is not that resilience is there at the Cassandra layer and there are no errors are propagating into our services so beyond just satisfying our users and providing an app and you know storing their data in in in Cassandra there's a lot of work that we have to do behind the scenes to satisfy our banking obligations so let's dive into how that's done so we've talked a lot about direct our pcs and the data flow but a lot of the compute works happen asynchronously almost like an event-driven architecture so for asynchronous message processing we provide nsq and Kafka both of which are really capable high throughput and highly available message queues so last I checked you know nsq and Kafka clusters we have a few billion messages flowing around every day for a nsq nodes we run them on I three instances to get the best possible performance we have instant storage what this translates to for customers is that we can deliver this notification that you've spent $4.99 at your favorite coffee shop before your coffee order has even been delivered by the barista now a lot of the flows rely on distributed systems and distributed systems means you may have problems wherever are ordering and mutual exclusivity and sometimes you need that mutual exclusivity and ordering amongst these systems for that we provide a highly available locking system built on top of a TD so in in this sort of distributed system for its course of operations EDD is a highly available and distributed and consistent key value store it has great locking primitives which allow for high throughput low latency distributed locking running this on top of AWS is a three infrastructure also allows us to get that guaranteed performance when we're using instant storage with their SSDs so similar to Cassandra reads and writes to at CD can come in to any node in n CD however the the slight difference to Cassandra is that there is a leader established using a consensus algorithm called raft so in this particular example here you can see like all the bubbles swarming around that's a leader election happening in green and eventually converges on s5 becoming the leader what this means is that all reads and writes we'll go through the leader to ensure the most consistent view and the leader will make sure that it propagates to a majority and the majority has written that log to durable storage in which case is the disk before before it gives an acknowledgment back to the client that the lock has been has been held what this translates to is that if the leader fails or like you know has maybe a network connectivity issue or there is a partition another leader election is held and EDD does some really clever stuff with the algorithm to make sure that the prefers a leader which has all the rep messages replicated to it already and like it makes sure that like the the failover happens gracefully essentially and you know the Li is not the leader anymore having a distributed consensus is pretty much a required property for locky I'm not gonna hand over back to Chris go talk a bit about monitoring so I guess as you've seen there's there's a lot of things going on at the the platform level at munzo so it's kind of crucially important that we're having like really good visibility of all those things and for that we use a combination of Prometheus and an open-source project called Thanos which came out of improbable so we use Prometheus for absolutely everything so that covers things from like request metrics through to low level system metrics business logic type metrics we monitor things like our customer operations how long their queues are when people are trying to get in touch we pull in things from cloud watch and we also use things like social media so we have an exporter which will look at Twitter and look at how many people are reaching out to us and if there are trends where that spikes we can we can allow on that and use that to correlate against things we're seeing internally as well it's quite a useful one so the monitoring setup we have is it looks a little bit like this so we have Prometheus shard it out into kind of separate functional domains so you can see here we have one that's looking at our micro services one that's looking at our infrastructure so that might be our ec2 nodes that are underpinning kubernetes and then finally we have one there that is looking at Cassandra and what we do is we we run two replicas of each of those ones so that we can first of all dollar eight failure if it happens when it happens so if we lose a node what which one of these services running on it in a problem and and likewise we can do routine maintenance on these servers so upgrades and things like that without losing any kind of visibility now the kind of the simple prometheus story for running this kind of Charlotte approach is that you you either just have multiple sources to go and get your data from and that was something that we really weren't very happy doing so we didn't want to be in a situation where people needed to know which which Prometheus server to go to to get that the data for the query that they're looking for so the other approach you take you could take with Prometheus is to run a kind of a hierarchical set up so you'd have a Prometheus above these three that you'd see here that scrapes in some sort of subset of those data and you can then look at it at that point and again we didn't really want to the concession of not having all of the data that we we want to be able to query all in one place so that's why we turned to an awesome so if Alice is a bunch of different components that will combine to give you a unified view of all of your your monitoring and the way that it works is we basically we're running this on top of kubernetes inside pods and we have a sidecar container inside each of those servers and what that's responsible for doing is periodically taking time series blocks and uploading them into s3 we then have this other component called balanced query and what that does is it coordinates with the final sidecars and it essentially presents as if it is a Prometheus server so when we have a query really one issue for whatever metrics it might be that we're after we would go to Thanos query balance query then fans that out to all of the side cars and figures out where the data is and returns it back to back to the user what we don't have with with this set up right now is we don't have any means to look at anything historical and our Prometheus service we we treat them quite quite ephemeral so we run with something like 24 hours retention and that's where this this final component Thanos store comes in into play so fan of store presents as if it is like a Prometheus server only rather than having the data locally it's a fronting all of that data that we've uploaded into s3 so the combination of all of these things means that we have a seamless view across all of the the shards that we run and also a near infinite retention of our of our metrics data so we can query and look at trends over a year for example or compare today versus last week now to give you a sense of scale of where we're at with online if we set up we have something like eight or nine thousand total scrape targets something in the region of 42 million active time series across the across the platform and we're ingesting somewhere near two million samples every second for this so people often often like comment on that and that they're like how can you possibly make sense of all that data there's clearly a lot going on there and the answer is there the answer is that we we don't look at all of it all the time clearly but it's incredibly useful to have all that data as a diagnostic tool for when things do go wrong and as you see here this is a dashboard that we've gotten graph honor it's cool our services that would and what what this is is every single one of those micro services that we spoke about they because they are built from the same framework from the bottom up they exposed the same core set of metrics so we know about all of the various RPC that they're doing to other services in the in the platform we know how often they're querying Cassandra for example because they're all built using the same Cassandra library and we can see how much they're doing must lock throughput they're doing so with this dashboard alone were able to query a vast quantity of those metrics all in one place and it's it's a kind of free resource for anyone shipping a new service so we've spoken a little bit about why it was just that slide we've spoken a little bit about all of the things that we have running in the platform and we'll revisit now like how they all come together to actually process a payment so it it starts as I said earlier with you and witness in a store with your Monza card you tap it at a terminal that message will come from eventually through the MasterCard network into one of our DC's we then process that over the a database Direct Connect and that will drop into a pod running in a VPN endpoint inside of our kubernetes cluster that will fan out into multiple services more than you see here and some of those will take locks against EDD to make sure that they exclusively processing for either that person's account or whatever else it might be some of them will be writing things to Cassandra and Cassandra or we're responsible for making sure all of that data is replicated out so it's durably stored and assuming you have the funds in your account and we can approve it we will we will send a response back through the whole chain to to the terminal and you'll be able to make your purchase for your coffee or whatever else it is immediately after that we will publish a message to say that that transaction has happened and then we will have some other service which is consuming from whichever queue that message gets put on and is responsible for then sending out a push notification which might include what you've spent how much it was where your balances is now now it sounds it sort of sounds and looks quite simple but when you look at the the kind of the full trace for a earth request that comes in through through MasterCard it really looks something like this so you can see it's an incredibly incredibly complex process where there are an enormous number of services involved so what next someone say we are we announced a few months ago in fact that we are going to be moving into the States as well we actually have cards with us today so if you're a US citizen and you would like a in one's owe account please do come and see us you can sign up and take one away with you today and and as part of that we are kind of looking at how we're going to be evolving over structuring the future so currently we're in one region we're in a u.s. one in Ireland and it's going to be increasingly important as we go to multiple geographies and we expand our customer base to to kind of divide that that platform up so we are investigating actively at the moment how we do multi region and then within regions how do we divide things down into into more like cellular base architecture and we'll be leaning heavily on AWS for for advice there as well and that is all we have so thank you very much for for being here and listening are there any questions how do you guys how do you guys take care of like in a business logic like you know kyc fraud you know those kind of aspects sorry what was the question so how do you guys take care of like in a business logic for like fraud kyc you know all the banking regulations and rules that is a really good question we have entire teams that are dedicated to ensuring that you know financial crime and stuff is dealt with we integrate with a lot of third-party providers but ultimately all that information is fed into our own rules based engine to make sure that we can make the best financial decision for customers and help prevent fraud to be a that's not something that we are directly involved in we provide the tooling that allows engineers to ship that kind of stuff on our platform so I don't think we would be the best people to really speak about that thank you guys for the talk since we have such a complex call graph how do you think about service to service a sillies that is a really good question so the question was about service to service s allies right now we don't have any strictly defined as a lays between services what we've done is we've taken the step to having excellent monitoring I think the next step would be to define SLA is based on what we see with our infrastructure naturally when we run these services in production we run them in a highly available fashion with multiple replicas every single services deployed across multiple AZ's so we make sure that the spread happens at the platform layer but services service as today's would really be defined by the service teams you know based on interactions that they have so for example if a service calls out to a third-party provider naturally that SLA needs to be taken into account for their particular service so we think that by providing all the analysis with the systems that are running in production they can make the best possible decision about whether they're meeting their essays and you know having automated alerting to make sure that when they are breach of the SLA someone is being paged one thing that we didn't really capture in the monitoring section is we spend a lot of time also writing alerts automated alert to monitor the infrastructure this involves using external systems to probe our infrastructure but also our using our monitoring stack to alert when things go wrong and linked to the right run books so for that we use a system called alert manager we which hooks into Prometheus and Thanos and you can essentially write Prometheus style queries to say okay if this has breached a threshold or if there's like a linear regression here you know essentially send an alert you know if it's 10% more than like more errors than we'd see on on a typical day then send an alert and you know we can then categorize that and say okay which team should that be going to we have very smart routing to make sure that the right right teams service teams get that a lot and act on it and yeah essentially first line second line defense just the talk I Sam quick question about scaling up your team's you mentioned here you've started only a few years ago now your team's grown massively over that time how do you ensure consistency across those teams and that they're all doing pushing things in the same way developing things in this consistently what are some top tips doing that yes I think so how alluded to earlier we are we have kind of accidentally become quite dogmatic about the tools that people use and I think that's been a key differentiator for us like having a massively consistent toolset and way of doing things means the like the movement of people between teams is is fantastically easy and we're able to swarm on problems very very quickly I think I think that's the main one there's all sorts of things different different teams across the company are doing to kind of solve problems in interesting ways as well so the platform team for example we've grown from sort of four people when I joined a little while ago and we're now something like twelve thirteen people and the structure of our team is kind of changed so we we have this concept of ephemeral squads where we basically have problems or things that we need to fix and we will assemble short-lived squads to fix those things and it's a good way to kind of work inside of a big team that's able to share context but then like limit that write down for for like daily kind of day-to-day things thank you UK Mons the customer good job guys two questions actually do you use any other type of storage then Cassandra and s3 that you showed on the presentation and what is the protocol for that you use between your microservices is the HP or something else for the first question about the the storage layer so yeah currently all our micro services talk to cassandra ultimately Arkansas so our services go through a common library so if we do want to switch that out into the future it doesn't sound as difficult as it might seem some services do rely on Cassandra more heavily than than others so switching those services may be a bit more difficult but yeah the onus on says is that most microservices are using Cassandra so that's why we are very excited about the managed cassandra offering that address announced yesterday that was really really exciting because that's something that we really really want to look into your second question was about RPC communication and how that's handled currently we use HTTP but because all of that is abstracted at the Envoy layer it doesn't really matter what we can do is we can make tweaks you know by switching to HTTP - and like you know by using geodesy at that layer naturally what we want to do is we want to move to G RPC at the service layer so all of our services are currently defined using protocol buffers it's just the transit layer in between is Jason because it makes it easy to expect easy to read but naturally that comes with some performance downsides so we hopefully will be switching into G RPC later on hi having lived in the UK as ten years ago banking in the UK is so much better than in the US so my question is is that now that you're moving into the u.s. what are some of the challenges that you believe you're gonna face going into the US market that's different than the UK so the honest answer is we're probably not the best place to answer that question and so we have a team that are currently based in the u.s. that have moved out there that are trying to solve those problems so I think we've acknowledged that just because we have a working product in the UK does not mean that it's going to work for a different a different culture a different different continent basically there are some there are some small things that are quite different like the payment flow for when you go out for dinner in the US where you you pay and then they come back and then you have to add a tip like those the kind of areas where I think munzo could help like and we're kind of at the process the process at the moment is understanding really the market making sure we're solving the problems that people have out here so I would expect there'll be a lot of blog posts coming out about the things that will be changing for the product in the US over the coming months as we kind of learn more hi I'm wondering how you negotiate proposals for changes to the platform either the architecture itself or libraries from the engineers working mainly focus on business logic in this platform team do you have a board or committee or proposals or how does this kind of work or is it all top-down so the question was around like you know how to reorganize proposals and architecture review and stuff like that so one thing that's really ingrained at monza is our proposals culture so any engineer can write a proposal we have a public slack channel we write proposals for everything it's not only engineering related so for example you know if you want to change maybe some of the offerings in the office like you know maybe there's soft drinks that are being offered then there will be a proposal written for that up for debate so that means that all members of the of the engineering community and why demands o as well so people who are not directly involved in engineering can look at these proposals learn from these proposals they're easily searchable in our reuse notion you know they can essentially feedback directly into the proposal we also have a really strong like one-on-one culture so for example if you want to learn more about a particular system you can easily pick up one on one with that engineer who will be happy to whiteboard that system or like you know provide more information in the proposal itself we also have systems like the architecture forum which is like a gathering of people and anyone can bring any sort of proposal in the architecture forum if they're unsure about how they can they want to implement a particular problem or like you know they want more more broader review across lots of different expertise and domains so you know we have people who have read the Muscat manual back to front and sideways you know who can recite every single declaration we are not those people but if we had a particular question about that sort of like payment network you know if we had a platform perspective to bring to that we would go to that sort of forum and you know bring our perspective and they would provide their input as well on like what we need to do to make sure that we comply make sure we have the best performance and ultimately the best user experience for everyone at monza are you planning to launch more banking services such as saving accounts or transfers to some other banks and how confident do you feel about launching those new products very fastly in your platform thank you so there's no there's no grand plans for us to become like the kind of big banks banks that have come before us where they get you in with a current or checking account and then cross-sell you to all their other products like that's not Manzo's plain ones those players to allow you to have your your money wherever it makes sense for you and we want to be the best current or checking account that you could possibly be and then allow you to integrate with various other different parties so that you will we will hopefully streamline that process so if you need a savings account you could go to your munzo app and you could open it and we will show you options across all of the various banks and you can then choose the one that's right for you and as a customer you end up winning in that that sense like you're gonna have greater choice and greater flexibility to go where it makes sense don't know if there's PCI compliance in EU but obviously there is in the United States so I'm wondering what you had to modify with your if you've looked at it and what you've researched and what you've modified in your platform or not for PCI and things like that that is a really good question now it was around PCI compliance we definitely have PCI compliance in the UK and it's something that we do and have adhere to so yeah we are fully PCI compliant a lot of things have been around like the unique way of our infrastructure like a you know moving to the cloud running on kubernetes means we need to take a cloud and kubernetes focused approach to PCI compliance so you know when when the the PCI compliant the PCI Assessor comes along and says you need to have a firewall what they expecting is some sort of like f5 appliance like you know a Barracuda firewall or something like that and what we do is we say okay what we have is we have systems like Network policies which at the IP tables layer essentially blocks packets from routing to services so at the networking layer we can say with confidence that a packet is not going to make it into a service which is not intended for it to talk to we've got an entire blog post about it online in our blog about how we've rolled out pcs how we rolled out network policies across our entire infrastructure so all of our micro services adhere to this sort of network policy to essentially do firewalling at the networking layer so no service can talk to any other servers which it's not intended to and we manage that via code review to make sure that you know if you're calling a sensitive system like the ledger for example you know there's only a very small subset of services and that goes through an engineering approval process so that you know if you develop a new service to have that agility you can go and essentially seek out a peer review with the team that maintains that particular service and say yes I am authorized and then the moment you use shipper which is our diploma tool that Network policy gets authorized right in the flow of your deployment the engineer doesn't even have to worry about it apart from going through the code review okay so to the extent that I understand this is to do with the authorization aspect of your transaction processing of the bank how do you deal with the network settlement or invoicing customers do you have to do that or does MasterCard take care of all that for you how does that fit into the overall design that is a really good question about like an invoice settlement and you know authorization and stuff like that unfortunately I don't think we are the best people to really talk about that because we are far removed from the payment systems apart from providing the infrastructure and essentially running the data centers if you write into our payments teams I'm sure they will be more than happy to explain we do have a community forum and we've also written a lot of blog posts on our engineering blog about how we do these things because we do want a transparent culture none of this needs to be hidden behind behind walls so yeah just asking our community foreigner I'm sure a payments engineer can explain to the extent that they can hey you said that you were interested in the managed Kassandra service but it it doesn't look like you're using services is that a conscious decision you're right no we we have built the bank out of broad components and we are like probably the most boring a SS customer we use ec2 and we use storage and we use networking and we are we are possibly going to change that we are definitely evaluating the more managed offerings so there are things that there's managed Casca there's managed Cassandra there's obviously KS for us it's it's kind of about figuring out where that line is and what's differentiating for us so in the past we have fought kubernetes code and been able to run our own version of that to fix problems quickly we would obviously give some of that up if we found issues in any KS but like the flipside of that is that it's really costly to run all of these things like completely yourself so one of the things we're looking at next year is like where we draw that line which the systems we really care about running ourselves in which the ones we we would we would love to hand off the the Cassandra one is really really interesting like managing data is super difficult and and really scary frankly like home wants to do it if they can avoid it so we'll be looking like closely at that one do you see a situation where like your data centers interns will be completely eliminated sorry do you mean completely automated eliminated when mascot allow us to plug in directly to an AWS feed then yes like you know running around datacenters even though we have like one rack across multiple locations one or two racks across multiple locations it's still really really painful difficult it means that we have to reinvent a lot of the stuff that a de Bresse provides and that we take for granted running datacenters is really really hard and not a pleasant thing to do you know right now we see it as a necessary evil and we would love to have AWS managed systems like that it really depends on like you know whether a device keeps pace with like our rate of innovation so you know there's lots of different payment schemes and you know we want to make sure that if we do give up that control that the eight of risk and hook into all of those payment schemes around the world for our which is our ambition so yeah there's going to be a lot of considerations but fingers crossed you know hopefully we can eliminate those in some time does that answer the question or was it do we ever have data centers go off golf line do we take them offline yeah yeah is it's literally we have to plug the cable in somewhere yeah sadly so since you guys have grown so big now is there any pressure from the regulator's to go multiplayer or multi-region it's a really good question no there's no like direct pressure I don't think the the regulator is concerned at that level of abstraction essentially they want us to prove that we are resilient and that we're engineering things in a way that is like good good outcomes for our customers so there hasn't been any pressure for us to go multi multi cloud we internally want to go multi region when it makes sense for us I don't think we want to go too early like if you go too early it can be really difficult and it can slow things down which is like a net negative for your customers so yeah in short no is the answer cool thank you very much

Info

Channel: AWS Events

Views: 9,261

Rating: undefined out of 5

Keywords: re:invent 2019, fintech, england, UK

Id: NTgB2z0E9ZU

Channel Id: undefined

Length: 45min 49sec (2749 seconds)

Published: Mon Jan 13 2020