Hi all this is Chetan. Welcome to this lecture.
So in this lecture we are going to talk about AWS basic services. If you are not familiar
with AWS and you don't know much about AWS services this lecture is for you. Ok so before
starting this lecture I want you to understand few things about AWS. First - AWS global data
centers ! So whenever we will be using some AWS services typically we will deploy it in some
AWS geographical area now that geographic area is called AWS region like across the world there
are different AWS regions available for us like in U.S. there are seven regions, in India there is
one region in Europe there are a couple of regions and in all there are 20 regions at the moment till
date and 5 more reasons coming soon and when you deploy services we can choose which regions we
need to deploy in now every region further is comprised of typically two or more data centers
that's for high availability of AWS services and those data centers are called availability zones
we will learn more about it shortly also as in AWS global data centers there are something called
edge locations now edge locations are something like you can consider it like and caching devices
which are there across 100 plus cities across the world and your content like your media videos and
pictures what you watch may be on the Facebook or YouTube they get cached to the nearest location
and from there it is delivered to the user so it basically improves the performance by lowering
the latency network latency, so overall AWS has 130 plus services if you heard about EC2 or S3
these are like different AWS services and we are going to learn more about these services in this
lecture. Ok so as I said region is one geographic area here the blue area you see is an AWS region
and every region consists of typically two or more availability zones for high availability of your
application so when you design your architecture typically you will keep your machines in different
AZ's so that if one of the AZ goes down for some reason you have your machine running in another AZ
and your application then have high availability. Okay so we'll talk about more this in EC2 sessions
which is a different course but as of now you just need to know about these things. Ok I hope the
region and availability zone you're familiar with now let's move ahead and now I want to talk about
AWS services so before that just a quick overview of how these services regions and AZ really map
to each other? So first thing, if you have an AWS account AWS account is a top-level entity that
means once you have an AWS account you can deploy your infrastructure in any of the AWS region.
So as I said there are 20 regions as of now and every region then comprises of two or more AZs and
that's what is shown here. Now in AWS there are different services and they have different scope
with respect to region or AZ or account level for example say billing service it works at an account
level that means at the end of the month you get one AWS bill which you have to pay. IAM which is
identity and access management it also works at account level which means how many users you want
to create you can create that and all these users would have access to all AWS regions and AZ's and
the services because they work globally and there are more services we'll talk about shortly and
then some other services like S3, DynamoDB they work at region level that means when you create
S3 bucket you select in which region you want to create that S3 bucket right similarly DynamoDB
tables. And then there are further services like EC2 which is a VM, RDS databases, Elastic block
storage (EBS) which is a disk, all these works at AZ level. The scope of this services is AZ level
that means one EC2 instance cannot be in two AZs at the same time it would be either in AZ-1 or
AZ-2 or AZ-3 depending on where we are launching that machine and same with the databases and the
disk so we will see more services but from this I want you to understand that different AWS services
works at different level and this is a scope where AWS account is a top-level entity under which we
have AWS regions and then we have AZs in the given AWS region. Now let's move to AWS services. There
are so many AWS services as I said there are 130 plus AWS services and we can broadly categorize
them into different kind of computing power or analytic services like this so in compute there
are EC2, Auto scaling, Lambda Load Balancers, Container service likewise for Data analytics
there is say EMR which is Hadoop service, Kinesis, Athena so rather than talking about these services
in this fashion I would like to take some example so that you can map really how this fits into
some architecture and that probably would help you recall what service is used for what. Similarly
there are other categories like storage services and database services then there are some network
related services and management services further you have application services and development
services as well. So still it does not really take care of all AWS services but we have listed
the widely used AWS services and the popular AWS services. Okay with this what I want to do next is
that I want to build one application and we will see how to create the same architecture using
AWS services. So what we want to do is now to understand different AWS services where they fit
into any architecture we want to build a simple social media application maybe a mini version of
Facebook or an Instagram and then we will see how to design the same architecture using different
AWS services. Okay, so our application is fb.com for example our users will access it using this
name. So first thing if you want to deploy this application in your on-premises data centers
then the first thing you will need is one private network like every company has their private
network, we would also require something like this to make it secure of course. The next thing you
would require is a web server. Now to start with suppose we are a startup then we will probably
build a small code in maybe PHP and we will run in some kind of application server or a web server
and it should work maybe for at least 100 users or lower than that and it works fine and our users
will access this application using IP address initially so maybe this VM has some public IP and
users access it. Now what happens over the time is like you want to now extend your application and
you want to add some business logic some UI stuff, the login functionality and more. So that's
where you need to then have a web server as well as an application server so that all the front-end
stuff is taken care by web server but all business logic - suppose it's a Facebook kind of
application then maybe you connect with different people so adding that data and everything is taken
care by application server and of course further if you want to extend it you need some kind of
database like relational database MySQL or even you can have Oracle whatever you prefer. Right?
so if you have this kind of application it works well and it's called three-tier architecture and
your users are using this application using an IP address. Right? so this works well and considering
the app is really doing good your website is really doing good and there is more traction from
the users and somewhere then your web servers or an application servers becomes a bottleneck. Maybe
they are not able to handle the increased load on your application. So what's the solution?
Typically we will scale. Now that scaling can happen vertical scaling that means you increase
the capacity of these machines or you could do horizontal scaling. So typically in three-tier
architecture you will see web servers and application servers are scaled horizontally that
means you will bring more web servers and more application servers right? like I have shown here.
Okay that's fine, now I have multiple web servers and multiple application servers but as you know
there are multiple web server that means there are multiple IP addresses and now is the time where
we need an intelligent entity who can really distribute load to this web servers and that's
where we bring in the load balancer service. So if you have heard about the load balancers like a
HAProxy and Nginx they do something like this - A user hits the request to the load balancer and it
evenly distributes that to back-end servers like this. And as you know now we have load balancers
also and your application is really catching up, typically you don't want your application to be
accessed using an IP address. You want people to access your application with the domain name
something called say fb.com and that's where you need some DNS service where you can map your DNS
domain name to load balancer IP address probably. Right? Okay so far so good this works fine right?
Your application is three-tier and it is working well. Now it catches further and you are now
having lot of data or say you have number of friends are growing, number of connections are
growing, number of posts are growing and that's where your relational database cannot really serve
this kind of data storage. You cannot do that in relational databases. For this you need a scalable
databases and also for connection information and all it makes sense to rather going for NoSQL
databases. So what you will do? Bring in the NoSQL database like MongoDB or Cassandra, anything
that you want to have. So some part of data is stored in relational databases and other is stored
in non-relational or NoSQL databases but still your relational databases could be a performance
bottleneck. Maybe there is read heavy operations happening on this database and for that typically
you will bring in one more component which is called database caches. Ok so you bring in some
database cache engines like Redis or a Memcached where you can query the frequently accessed data
so that your application servers don't hit the database but all the requests are served from
this cache engine. Ok so this is fairly better architecture than where we started with. Now
next thing as you know Facebook might be getting millions of pictures uploaded daily and the videos
daily. Now this disk which are attached to the VM are not really capable of extending on the fly.
They have size limitations and that's where all these media, pictures are never stored typically
on these web servers or application servers. For this, you need some unlimited kind of storage and
that's an external storage and it should not be necessarily a block storage like your disk. It can
be a file storage like a shared filesystem or or some external storage like Google Drive if you are
aware of that, right? So you need some external storage where you store this information. Okay so
that makes your storage that solves you a storage capacity problem if you use external storage.
That's fine, so far so good! Now next what happens is when you upload a videos or photos you need
some kind of content filters like maybe you are uploading videos and that video has some content
which are objectionable or there are some pictures having some nudity. So you need some content
filter which can do this on the fly and then those pictures, videos should be actually stored
here in the external storage so we bring in one more component there. Right? Okay that's fine now
you also know Facebook also throws lot of ads and it is continuously watching what activities you
are doing while you are on the Facebook page or maybe what kind of products you are liking, what
kind of posts you are liking and based on that it gives you suggestions and the friend request,
will throw a lot of ads , right? So this is called clickstream analysis. Every click is getting
captured somewhere and it is getting analyzed in real time so you need some kind of clickstream
analysis engine there, right? Let us take an example - Twitter. What all tweets are going on
in the market? What's the mood of the people? Currently all this is done using the clickstream
analysis. On Facebook also you have something like this. Now all this data what this clickstream
analysis engine captures it has to be further stored somewhere in the external storage right?
and you need an external storage for this like this storage for storing this data and further you
want to take this data and do some data operations like you need to run some maybe aggregations,
you need to sort your data and you to find some meaning out of that data and that's where you need
some kind of Hadoop platform which can perform the computing on distributed systems. Right? So you
need some kind of a Hadoop platform and you would also require over the time one data warehouse. Why
because maybe at the end of the year or Facebook does lot of data analytics right ?Maybe at the
end of the year they want to find out which kind of users are accessing Facebook more? What are
their age? in which region they come from? How particular feature of Facebook is being used so
that they can concentrate more on those kind of features. What is trending? All this information
is taken out by storing this information in some kind of data warehouse engine and then doing
some kind of business intelligence on top of it. So you need some business intelligence tool
which can query this data, analyze this data and then there are reports generated out of which then
Facebook can take decisions like next year maybe this is our strategy or we will focus on this
area or that area so some business decision you can drive based on what analytics results come
out of this. Ok! so this is more on a back-end side which end-user does not really know but
this is happening there. Ok so far so good! so we have extended our architecture. Now next what
we have is all these photos and videos they can be directly served over the internet because
you consider this like a Google Drive so you can directly maybe stream your videos and watch
pictures directly from this storage. So users might come from the web browser and they may watch
whatever post. Suppose you have posted a video so they can watch that video here but sometimes your
users come from using mobile devices nowadays they will watch your videos through mobile phone
and in that case you need the same videos but in probably different format that's because mobile
device might play a different format of the video and for this typically we will need some kind
of video converter in between so whenever any user upload some videos maybe they should be
immediately converted into a mobile friendly format. All right? So you need some kind
of computing power here as well. Okay so we will introduce that as an Video Converter here.
Next all these photos and videos are typically served from as I said from the external storage
but you know whenever some video gets viral, millions of users watch that video. Now every
time if that video is fetched from this location, this might become a bottleneck or you may pay
a price because your data is flowing out to the Internet and there is a lot of data usage
for your videos. So to solve this problem, you need to have something called CDN - Content
Delivery Network, which is nothing but which caches these videos and pictures to the nearest
caching devices from where the user is accessing your videos. Right? so that all the users in that
geography when they want to watch the same video it is served from here it is not really served
from here so user experiences the low latency and gets better experience. So in the applications
like Instagram and Facebook or YouTube, largely they would have lot of content delivery networks
through which the contents are served. Okay so far so good! We have extended architecture
further. Now you know Facebook also sends you mobile notifications right? There is a new
friend request or there is a likes on your post, now for this we need some kind of notification
service, right? Maybe you get a SMS or mobile push notifications so you need that service. Also
it sends you emails right for various activities you can disable that but yeah there is option
to opt for email service as well, right? And further you can also chat with your friends and
for this typically a queue is used. Now messaging queue - if you heard about like RabbitMQ JMS
queues, IBM MQ, these are all queue services which enables the kind of first-in first-out and
that kind of data structure so for chatting maybe you require some kind of queue service as well.
Okay so if we consider all these services it's a bare minimum kind of social media application. I'm
sure there must be many more components but we are just sticking to this as of now. And finally if
you want to deploy this architecture and monitor it continuously like How my VMs are doing? How
my databases are doing? How my storage is doing? How much storage is there? For all this, you need
some kind of monitoring service and a dashboards like production dashboard where you can monitor
health of your application. Okay so overall this will be your architecture and this probably will
be deployed on on-premises and now let's see if we want to do the same thing on AWS then how we will
do this? We want to do this now on AWS ! So let's see. First thing this private network what
you see here, in AWS world it is called be VPC virtual private cloud! So it is not exactly
the way it is shown here because some of these services are outside VPC but I cannot accommodate
that in a diagram but consider VPC as one private isolated network that AWS gives you and then
you would have to manage all the public network for web servers and load balancer and a private
network for databases that is a separate part of discussion but the VPC is a network service.
Now all these VMs that we are talking about, these are nothing but EC2 machines right?And the
disk that we attached it's called EBS - Elastic Block Storage and they have limitation of maximum
size. So EC2 and EBS solves your problem of the VMs that typically will deploy your applications
on whether web servers or app servers. Now further you can have an auto scaling enabled for EC2
that means if the load increases on these EC2 instances then they can scale horizontally
automatically and if the load decreases they can scale down maybe from 2 machines they can go to 10
machines from 10 they can come back to 2 machines depending on the load that you can configure
using auto scaling feature of AWS EC2. Further, for relational databases there is a service
called RDS and for NoSQL databases there is a service called DynamoDB. For DB caches there is
a service called Elasticache service and it comes with a Redis and memcached engines in that. Okay
further as you see there is a load balancer so in Amazon there is a service called ELB - Elastic
Load Balancer service which can distribute the incoming traffic to multiple back-end EC2 machines
like this and for that if you want to have your domain name mapping to your load balancer then
you need a DNS service which is called Route53. Ok great ! Now let's talk about the other stuff
that we have like for external storage it is a S3 service of Amazon simple storage service right?
Which is an unlimited storage - you can just go on dumping the data and it is accessible over the
internet directly and there is no size limitation how much data you can store in your S3 buckets.
Also you need some content filter so there is a service called Rekognition which can find out
an object in the images and it can filter it out before you upload it to the say S3 buckets.
Okay now as I said you need some kind of service where your videos from one format get converted
to another format like mp4 to some mobile friendly format. Now for this, one option is you run some
EC2 machines which continuously watch your S3 buckets for new videos as the new video comes they
download it here convert it and put it back into another bucket that's one option but there is a
better option for this like a Lambda service. Now lambda is a serverless service of Amazon where you
just write a code in that code you specify how to maybe convert a video and you can execute this
lambda function whenever there is a new upload happening into your S3 so new video comes
lambda gets triggered, it will convert your video and maybe you have put in logic that put
that video in to another S3 bucket. So now here there are no servers to manage! Everything is
taken care by lambda functions and this scale automatically. Okay so we got lambda there. Now
let's talk about this clickstream analysis. Now for clickstream analysis there is a service called
Kinesis which can capture your click stream data and then you can analyze that data, you can even
store that data in S3 and you can do much more with whatever data you capture. Right? Now for
this spark or Hadoop platform there is a service called EMR and what EMR does like operations like
aggregation, sorting and you can run distributed jobs - SPARC jobs, Flink jobs. All this you can
run in this managed Hadoop cluster and you also need to do ETL transactions from your DynamoDB
tables like maybe you want to do what all friends are their? friend's friend? What activities they
are doing? You want to continuously push new post on your wall. Now all this is done in real time
using clickstream analysis and at the end of the year maybe you want all this data to be extracted
& converted into different format data cataloging and then further do some data processing using
EMR so you need this glue service for doing this extract transform and load operations - ETL
operations right? And then finally all this data what you process or what data you have, you
can store it in the data warehouse service which is nothing but Redshift in Amazon. So redshift
is a data warehousing service which can store petabyte scale of data and you can perform
the analysis on the data. And to perform this analysis and see the results you need some
BI tools which like there are various BI tools in the market but in Amazon you will use Amazon
Quicksight or you can also use Athena which is a SQL query interface so you can pull data from
S3, perform maybe a SQL operation on that and all those results can be viewed in a Quicksight.
You can build some graphs, some charts and you get insides of your data based on that you will
take some business decisions so it's a BI service from Amazon. Ok so far so good! we introduced lot
of AWS services here now let's move to this side, Now as I said there is a content delivery
network which can cache a you're a static content and for this in Amazon there is something
called CloudFront service and CloudFront stores or caches your data in edge locations. Like I
said these edge locations are across the cities, across 100 plus cities across the world and when
you use CloudFront service all your data from S3 or wherever you store your data, it gets cached
in the nearest edge location from where the user is coming and the data is always served from
that edge location for all the users in that geography. Ok so that's a CloudFront service. Now
let's talk about this side also as I said you need to send an messages and mobile push notification
in Amazon there is a service called SNS - simple notification service for that. And if you want
to send emails, bulk emails then there is an SES service - simple email service. Now for messaging
queues for chatting application Amazon has built its own queue service which is called SQS-
simple queue service and finally to monitor all this infrastructure - how my EC2 instances are
doing? How much is the CPU utilization of EC2? How is database is doing? All these can be monitored
in real time using a service called CloudWatch. Even you can set alarms like if an average
CPU utilization goes beyond say this percent, send an email or alert to the administrator or
take some action, do some auto scaling here, all this can be done using this CloudWatch alarm
there. Okay so I think we have completely replaced what we did on-premises with all AWS services
and I hope you got some idea about all these basic AWS services. Okay next we want to see some
more AWS services and let's see some application services. Now as you know it's a Facebook or
Twitter or any other web services or even Amazon itself it exposes all their services through API
calls so that different third-party application can integrate with these applications and for that
they need a REST API service where they can expose all their APIs. So in Amazon, you can have managed
API gateways where it takes care of scaling, throttling, everything so you just write a code
for your APIs, definitions of your APIs and it can be deployed in API gateway. Also as the mobile
usage is increasing most of your users the web users you need to manage their identities like
when you develop an application your users must sign-up to your application right? And that
means you need to manage your user pools, their accesses and everything and for that you
need some user management service so in AWS that service is called Cognito. Right? so these are
more application services that we can use here. Now let's move ahead and talk about the security
services in this architecture. Now as you know there is one primary service for managing all
accesses in your AWS like all your AWS users, what access they have, what services they can
use and even when say one AWS service like EC2 wants to upload a data to S3 then EC2 needs
permissions to do that. Now all these accesses and authentication and authorization is managed
using Amazon's IAM service -identity and access management. It's one of the most important service
for securing your AWS account as well as services. Next, what you can also do is you can encrypt
your data which is there, which is stored at various storage locations like EBS is a block
storage like a disk attached to the EC2 , you can encrypt that data. Data which is stored in S3,
which is stored in EMR, Redshift, Queue messages, Databases, Caches all this data you can encrypt
using Amazon's KMS - key management service. So it manages all the encryptions key for you. You
don't need to have your own secure location where you can store your keys and do the encryptions.
Further as you know this application will be accessed probably over HTTPS which is SSL enabled
connection because obviously if users are doing some transactions or they don't want to lose
their important information you would secure that communication and for this you need digital
certificates, right? So that certificate you either deploy on load balancers or you may deploy
it on CloudFront so that your communication is secure. For this Amazon has a service called ACM -
Amazon certificate manager. Okay next as you know we can also have the application firewalls. Now
those application firewalls are called WAF - Web Application Firewall. Now that take care of any
attacks. It can prevent like cross-site scripting, SQL injection, even the DDoS attacks which are
happening, WAF can protect your application from these attacks and you will typically deploy it
on CloudFront or load balancers or in front of your API gateways that we saw in earlier slide
so that you are safe and other various ways is to secure VPC - the public and private subnets
that we will see in detailed VPC session - The networking in AWS lectures but here we are talking
about application level firewalls so that's WAF. And if you're going for some kind of compliance
for example PCI DSS compliance or say you're going for an HIPAA compliance so your machines
need to be patched properly they should be free from vulnerabilities right? or CVE as you know and
for that there is a service called AWS Inspector. What it does? It puts an agent inside your
machines and it scans your machine for any known vulnerabilities and then it will give you reports
saying like you know all these machines out of these machines we found these vulnerabilities,
go and fix those. So Inspector can give insights about what's there inside our machines.
Okay so these are primarily used security services and there are more but I think we will
restrict our discussions to only these services as of now. Next, we want to see some development and
DevOps services. Now as you see this architecture it has lot of AWS services and all are connected.
So when you want to deploy everything by hand maybe manually I would say it will take maybe
couple of days to do this, without making any errors or detecting the errors and fixing it,
all this has to be done manually then it will take two or three days probably but with AWS it
gives you ability to code your infrastructure that's called infrastructure as a code. So you can
have a service like CloudFormation. What it does? It takes kind of a template from you which is in
JSON or YAML format and it will just create this infrastructure from scratch for you and that
too within maybe 30 minutes depending on what size you have but typically I have seen in like 30
minutes maximum it will create all these resources for you. It's a very powerful service which can
provision your infrastructure from the scratch. Right? And now this CloudFormation template
will be written by some DevOps people and at the same time you would have your developers and
a QA, where developers are writing code for your product and maybe QAs are writing QA test cases,
automation test cases, now everybody needs some kind of code repository like a GIT code repository
for that AWS has a Codecommit service where they can check-in the code. So even this CloudFormation
template is nothing but a JSON or a YAMLcode so these guys your DevOps guys will write that as a
template, CloudFormation service will take that template and create this infrastructure. Now once
you have this infrastructure up, you require your actually product to be build and for that you need
CodeBuild service. So Amazon code build will take the source code in whichever language you have
written in Java or whatever and it will build that using some kind of build tool like ant or maven
and also while building it will do some unit tests and finally it will produce some artifacts. Now
artifacts are like your exes or binaries, actually your application executables basically. So the
CodeBuild will do that, will test it and then you have to deploy this. That means whatever it
produces you have to put these EXE's and binaries in EC2 machines where your application is actually
running. So you will require a deployment and for this you have a CodeDeploy service. Alright? So
if you know about the DevOps you heard about the term CI and the CDs so this is your CI pipelines
continuous integration pipeline or a continuous delivery pipeline you can say and if you want
to have this automated like you know developers are writing the code, checking it in, it
automatically gets build, it automatically tested and automatically deployed into corresponding
application servers running in EC2 then you can have a Codepipeline service. Right? So you can
completely build your CI platform here using these three services. Now if you want to further
integrate all these things with project management tools like maybe a JIRA or some bug tracking tool,
what's the speed of your development and all the management tools, now it is called a Codestar
service which very well integrates with Atlassian JIRA and other tools so you have complete SDLC
control now if you use these development and the devops services. Okay so I think this is
clear now where these development and deployment services are used. Okay so if you have come up to
this you know about most of the AWS core services now for compute, analytics, storage, security,
application, and deployment services thank you