End-to-End Visibility with Cisco ThousandEyes Integrations

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
here i'm perhaps seeing um and as tom mentioned i'm part of the thousand ice product team i lead one of our core product areas around you know cloud enterprise agents and collecting synthetic network data and the visualization that goes along with it and i'm really excited to be here to essentially announce you know two key integrations um now that we're part of cisco um and also broadly kind of we went we've been thousand eyes updates that you may have missed uh if you haven't been on the product like i am all day so let's get started um you know for folks that may not be as familiar with thousand eyes i wanted to kind of do a quick um overview of who we are you know what problem do we solve um and then go into the respective you know updates um i think this this image is a really good illustration of kind of the problem we solve and and what we're seeing a lot of our customers over time um use thousand eyes to solve problems um you generally what we've seen in the past is you know you have or even today as customers continue to migrate their infrastructure to consume sas is you know you tended to have applications that were deployed within your data centers these could be business critical applications like sap and oracle and you have a set of locations on the left hand side uh where corporate traffic was originating and then traversing either your mpls or your backup internet dmv pn to get access to those applications and what we're seeing change a ton as these applications move to the cloud is um is sort of this this increase in the wider dependency across the internet and across other third-party networks that are now really part of the extended network that corporate it needs to understand and be able to ensure that users are getting the best possible experience of the applications and this applies you know regardless of the application is you know moving completely to sas and being consumed as a sas subscription or you have an application that's moving to uh to to the cloud via in one of the the is providers and you have this you know middle mile that's the internet and also now with cloud security gateways kind of becoming the extended network beyond your four walls that are now responsible for the end-to-end user experience and what we realized also early on is um the existing tool sets that existed and continue to provide a ton of value within the corporate perimeter you know that the snmp flow-based solutions you know pcapa solutions have a ton of value but they become less useful as the network becomes more widespread and complex across the internet and that's really kind of the problem we were looking to solve is is providing back control and visibility in an environment where you know applications are no longer in the perimeter which actually also requires you know that the network teams to re-architect the network to better serve users that are accessing these applications and that's also required requiring sort of a change in the security model to support these set of applications and i kind of reference here um you know ben hart with his paper from 10 years ago you know talking about how software is eating the world and we see this today where you know software as a service is essentially for get providing added pressure to the it teams to be able to ensure that they understand the true and doing experience when neither the application um most of the network and the security stacks are now within your control um and that's that's required can put added pressure on these teams to kind of work together and pro and and better understand the end to end user experience as uh you operate in this new world and so that's that's the problem that we wanted to solve now obviously as eston mentioned you know we are now part of cisco and that mission continues um we see you know tremendous amount of um of value that we we think we can provide customers but now that we're part of cisco i think there's there's value in us being able to leverage the overall cisco ecosystem to further accelerate um how we can provide access to the visibility that thousands provides and i wanted to walk through kind of one that the core pillars of our vision within cisco um the first is kind of this around how do how do we make 8000 as agents as ubiquitous as possible across the corporate infrastructure environment and i actually want to you know walk you through a story of how this evolves i started the company about you know in 2015 and um even at that time cisco iet was one of the earliest customers of thousand eyes and they use thousand eyes to understand salesforce performance and we're having you know problems with return traffic from salesforce to some of their corporate sites they're using salesforce for their for their needs they're also using thousands for understanding webex performance and what became clear as we kind of moved beyond the pilot phase is kind of this need to deploy agents across the corporate environment and this was you know hundreds of sites but only 10 or 15 percent of the sites actually had compute to deploy agents which are really kind of required to be able to run the synthetic tests that you need to get the visibility and that's when we we kind of started through the earliest conversations with cisco it and cisco in general to be able to host thousand eyes agents on a cisco isr at that time that had just come out with the capability of using app hosting and so we we worked with them and certified our agent to be able to deploy on a cisco isr um so that the network teams can retain control and have a set of uh essentially hardware to be able to run the thousands agents and get the visibility that they need and so we're really excited now that we're part of cisco to essentially turbo charge that effort and that's uh that's what we'll be talking a bit more in depth on and how do you um how does thousandeye's agent become natively part of the cisco catalyst and routing infrastructure the the second element of um of uh the the vision behind us became part of cisco was you know providing this this common operating language for application and network teams where we can provide sort of this full stack domain isolation and problem resolution across you know the application or network and providing business context as we go and again this is something kind of grew organically as we you know worked with customers uh we've talked about this in the past whether you know customers are using app dynamics or new relic or datadog for their apm needs but would generally have this growing set of third-party dependencies uh where uh let's say you your your financial application and a reliant upon a third party payment processor via an api and there's this gap invisibility from where every application hosted to this third-party payment processor that needs to be accessed to be able to better serve your user that's actually using your application or you know you you're you're a healthcare company and have part of your application stack served from aws but the authentication um uh the authentication system tend to be on premise uh due to hipaa requirements and others and so you you needed the visibility across your interconnect that's connecting aws and data center networks whether it's direct connect or internet so you you saw this growing need to actually understand you know dependencies that are beyond the application that are traversing multiple networks across across across your your private land and being able to provide data the thousands can help provide you know visibility gaps for for those environments and then and then thirdly you know as we look to make agents more ubiquitous across across the corporate i.t landscape and make it easier for customers to get the the visibility that they need you know provide the ability to um understand gaps and visibility across you know the the full stack we also want to take now you know the data that eventually we'll be able to capture and use that to power some of the decisions that eventually are their customers looking for to be able to kind of drive decisions that help me better serve my users regardless of the the internet service provider or you know a third-party dependency that i use and so having having said that let's actually walk through two new announcements and sort of the phase one of our early uh work within cisco where we've evolved the first is um what we announced at cisco live was the ability to deploy a thousandized enterprise agent on a catalyst 9300 or 9400 device to be able to provide visibility across your sas apps across sd-wan we'll talk about that and the other is um across this you know common operating language being able to provide um thousand eyes data to be able to understand these third-party dependencies in a single uh dashboard uh with with app dynamics studio and for for those that maybe aren't familiar with sort of the existing options that are currently have already existed for thousand eyes they're kind of talking about the the the problem we were looking to solve with the catalyst team at cisco at the outset we've had the ability to deploy agents in many different form factors for a long time as you're seeing here we customers are deploying you know virtual appliances in in vmr environments and hyper-v they're using linux packages to be able to deploy you know ubuntu rel centos environments uh they're using docker containers to deploy you know containers within your private lan or you know aws and we also have an aws based ami that allows you to launch a cloud formation template to be able to deploy an agent in aws and um while while that was useful you know the the the problem that we were running into that consistently is the time it takes to deploy an agent where i have to find compute and have to interface with you know server and virtualization teams is essentially let me know delaying the amount of time it takes to to be able to get spun up and so we saw this opportunity where we can essentially partner with the catalyst team to be able to make thousand ice agents native with a catalyst 9300 or 9400 device and so that's effectively what we launched and so there are two two parts of um of this integration one part the first part is how do we make it easier for customers any existing or new customer um to be able to deploy an enterprise agent on a 93 or 9400 device without requiring any additional hardware um and provide the visibility that they need a second is once they have the agent how can you actually run the test that you need by providing some thousandized test capacity as part of their dna subscription that that goes along with the catalyst 93 or 9400 device so to to talk a little bit more in detail about this the the the agent is now uh able to run natively on a 93 1940 device and by natively what we mean is no additional hardware required where traditionally you needed an ssd to be able to install any any application um the agent in this case runs on boot flash that already exists and generally will have enough space to be able to run the enterprise agent um the only kind of required element here is in iosexy update 2733 um if if you're a new install and by that we mean you purchase a 93 or 9400 switch starting april april 5th 2021 the agent is already embedded in manufacturing um so when you when the switch arrives at whatever site you're you're you're you're ordering this to um uh the asian container the the agent docker image is already there and it just needs to be instantiated and and and spun up using either the cli or dna center for you know the the orchestration um if you're an existing install um there is um there are multiple ways of getting the asian onto the 93 or 9400 using rsxc you can use a direct you know web link where we host our image and you know pull that down directly from the switch if you don't have direct internet access you can download it locally and upload it onto boot flash um and be able to essentially make sure that the agent is connected out to um 2000 eyes and and the the only thing that the agent really needs to communicate with thousand eyes um is essentially gcp 443 traffic out to you know thousands.com uh domains and subdomains uh to be able to authenticate itself and also get uh updates uh across you know types of tests that should be running that's sort of consistent with how we've been operating the agent for a long time we we totally understand you know ios xe upgrades aren't always easy um there are a long process that requires you know vetting across customers and users um but this is a one-time update only to be able to allow our agents to be installed in a 93-9400 in the future all enterprise agent updates are independent of io sexy and that's an important point because generally you know we a thousand as being you know sas um um subscription and operating a sas model um we publish updates to our agent and its capabilities every two weeks um and so there's the ability to essentially update your agent independent of isxc as you move forward and be able to use this um at scale for certain set of you know customers there's also the kind of the need to proxy connectivity out to thousand eyes and there's a better capability to proxy its connectivity out to thousand eyes for connectivity as well as use using proxies for testing to you know sas applications um across your enterprise branch and data center networks the the only caveat to this is you know one requiring ssd when users need to run browser-based tests and these are the tests that allow users to be able to to render an entire web page or a series of web pages within the agent environment for application layer telemetry and then correlate correlating that down to the network and so we call these page load and transaction tests that essentially have an instance of chromium embedded on the agent and they run in tandem with the set of network the the suite of network tests that we run and for for for these types of tests ssd is required mainly for the i o requirements uh that they're needed and additional resource intensive based um um uh based test cases that that are needed for uh for running our tests strictly 93 and 94 or what what about the 95k platform or the 9500s yeah uh currently we prioritize you know the 93 or 9400 um uh the architecture across the the switching line the catalyst 9k switching line is um the same so it allows us to be able to extend it out we prioritize these mainly because these make up a bulk of the deployments and also the use cases that we see across customers are you know we need an agent as close to the user as possible and 9 300 are essentially at the access layer typically and so most of the needs are really met with the 93 9 400 95 95 600 is something kind of we're discussing although that's kind of kind of further out in the roadmap but uh generally we've seen that the 939 400s are where most of the agent deployments are are going to happen so the 9400s get the attention because they're often collapsed access distribution exactly that makes sense thanks so yeah let's let me walk you through you know the agent deployment on a catalyst 9k through dna center it's simply for kind of bulk agent management and be able to deploy the agent on bulk you can also use the cli for sort of you know lab environments and also at scale using you know the catalyst apis to be able to kind of deploy the agent in bulk um when you're in dna center you are using the the provision the provisioning service catalog to be able to access the app hosting catalog that allows you to run um uh or deploy the the the thousandized agent onto the respective switches that are available and so you'll have sort of this in in in dna center you'll have the service catalog that'll have a series of application templates that can be deployed at scale across your you know catalyst infrastructure in this case i have a predefined thousand eyes template um that includes the agent um that was kind of uploaded into dna center as i move forward through the deployment um the main thing that the template identifies is really one like how do we identify this agent that and which tenant it belongs to in thousand eyes and that's done via an account group token don't worry the account group open has been refreshed this is because this was done prior to the meeting so nothing compromised there you can also you know specify the host name if needed typically we see this for kind of consistency across the the corporate it environment and there are a host of other options like proxies and kerberos authentication that that are needed sometimes for customers to be able to make sure that the agent can authenticate through and talk to thousand eyes as i move forward in the install process um you know dna center actually has the ability to quickly help me understand which devices are compatible and you know to to run the agent and which aren't by checking the ios xc version checking the sufficient amount of memory the cpu and disk capacity that's needed um and in this case you know you have um the the 9300-2 device that's ready for install and so i'm going to go ahead and click next and go through the provisioning process of the application that in this case you know we need to specify the network that the agent container needs to connect through specifically the vlan in this case i've also defined that i'm using dhcp to be able to you know grab an ip for the docker container that's going to be running and again the docker runtime options there that you can specify to be able to run the agent um and essentially you know once i once i am go through the provisioning process and hit you know auto start the app and i hit the application to be deployed you'll see sort of a progress bar that deploys that in the interest of time i've actually gone ahead and deployed an agent myself once the agent agent myself prior to the meeting and once the agent is there what you'll see in thousand eyes is sort of an agent pop up here that says you have an agent that's been deployed here is actually that agent that's been deployed um you can actually kind of see that it's running on a specific you know catalyst 9300-24t device that we're getting from the ios ios and propagating all over the application it's identified as certain applications application hosting deployment type um and more importantly once the agent is installed you can now go into into into the app and be able to actually run the relevant tests that you need across your your environment and so let's actually take a look at an interesting example of what can you do once the agent is on let's say a catalyst device and you're running a set of tests what can you do with the data that thousands captures um in this case this is an interesting event because we recently ran into where we had an example where a set of devices were deployed in south america that are accessing key set of applications across my manufacturing environment and um there was sort of this the the set of kind of users that were complaining that these applications that my manufacturing team needs are not responsive or we're seeing these problems recurring every day and the network team needed to essentially come in here and try to understand where problems are occurring and in this case we have um you know a set of manufacturing sites across you know u234 et cetera that are named appropriately where i wasn't mainly interested in site number two where we're seeing reports of problems um across my across my environment and if i look at kind of the the overall trend over time you know you're seeing sort of this this time of day where uh there there's loss uh that's being observed this is end and loss from the agent that's residing within you know my my my manufacturing site and that's going to a particular application that's internal app on you know port 1521 and end to end i'm seeing loss of you know 30 anywhere from you know 20 to 30 percent i'm seeing latency that also shoots up if i actually look at latency uh during that time as kind of as expected um if i actually look at kind of the path visualization then to try to understand where this is happening and since i care about you know site number two i can look at kind of i can filter out all the information and specifically look at you know the site two's path to that application um and in this case you know i'm seeing kind of where some of the delay might be happening although that may not pinpoint to where the problem's occurring one thing i do notice as i look at the data is i'm actually seeing a tcp maximum segment size and mtu mismatch where um you know the the agent that essentially acts as a client in this network is sending traffic to the application the application that synack responds advertises a maximum segment size of 1460 bytes but from the pathing to detection that the agent does the minimum path from q um is actually 1441 bytes um and if i actually look at use some of some of the quick links um the path m2 is essentially decreasing across that link which probably by just looking at you know the the delay that's happening it's probably across the land and there's um some sort of a tunnel possibly but although it's hard to know exactly uh what it is what's happening but i i am seeing a decrease in patham2 at that time that could potentially be um um causing this delay um and and loss and if i actually go back and forward in time and typically you know what we're trying to do is try to identify in the path where is a source of this loss and in this case it's not a it's not directly clear where the loss might be coming from although we do start to see kind of some some white no's that typically start to start to happen when we see loss and can pinpoint where and one of the things we recently added is also the ability to pull in data from other applications that i'm also running from the same site because the problem that i'm looking to solve is is this an application site based problem or is a site-wide problem across every application and so what i've done is actually use a filter that is only showing me the data that's being captured from site number two and i can actually start to add in data from those sites to see if i can somehow get to a better understanding of where the loss might be happening and as i've added that i've dynamically aggregated data across those tests to then try to identify you know across a wider set of packets that are being sent at that time where that loss might be happening and then using sort of the visualization of thousands to kind of surface the source of where the loss was happening in this case and what i'm seeing essentially is you know most of the loss is happening across this this node where it's unable to forward the packets um and it also happens to be that same same link across which path mtu was decreasing um and and i'm seeing sort of similar behavior across the applications if i look at the table view and simply look at site number two um and i group um all the all the uh all the data by site number two and specifically look at the loss i'm seeing across um all the particular applications that are being being traversed so i've got a question for you actually a comment first and then and then a question i think the mtu visualization is great that's like there's so much tedious work that goes into troubleshooting that normally i think that's a great feature that you guys have and and in looking at the way you've laid out the flow and the hop by hop analytics if i'm running like this you said this is down in south america i actually do a fair amount of work down there doing like isp design like let's say this is in hong kong and i'm running a dual stack network and i've got lost and i don't know if it's on v4 or on v6 or maybe it's only on on one of them can the agent run in dual stack and see like hey maybe i'm clean on v6 but i'm i'm losing it on v4 because maybe there's a congested cgnet gateway is that are those types of analytics possible yeah so so the agent is capable of um kind of running dual stacked um in dual stack mode what i call um so it's capable of having an ipv4 an ipv6 address it'll have you know in this case you know this agent is hosted in a particular network and that is a set of dns resolvers if the application that i'm trying to traverse has you know say a quad a record um you can what typically you would recommend is you run two sets of tests one that forces ipv6 um and so you're using the quad a record to traverse and target the the ipv6 endpoint and then one that forces ipv4 and so you can kind of combine you can use two different tests to try to understand if there is you know ipv4 or pv6 based problems that could potentially be be caused uh due to that nature uh absolutely and and yeah and the way we capture the data yes these are the ad hoc tests that you'd set up separately um otherwise the agent will uh typically in in the test settings of of a particular test so um you'll have in advanced settings be able to identify would you like the agent to prefer ipv4 if it has both ipv4 and ipv6 just quite a record that it should target or would you like it to force a specific um you know protocol and so typically for the for the basis of kind of comparing the two what we recommend is you kind of force ipv4 and force ipv6 so you have both sets of data to compare against got it i think that's great you guys have a strong focus on dual stack and looking at both sides of it thank you yeah absolutely now i'd be i'd be remiss if i don't mention that you know we started working on the ability to deploy agents across the cisco portfolio last summer as soon as the kind of it became clear that we're going to be part of cisco and the world has obviously changed a lot since then um and you know getting enterprise-wide visibility is only part of the picture as we all now can we'll we'll work from homes and we'll continue to probably work from homes um for the first to be a foreseeable future um we i also wanted to mention um how we can get further visibility from these work from home environments where you can't necessarily have an enterprise agent that you can deploy but also but that makes up a very large percentage of the workforce that is needing to be able to access their applications and we we launched a product uh you know a few years ago called the endpoint agent it was still in the early stages at that time and in the last 12 months have become a lot more topical and relevant for our our users that are looking to better understand kind of what is my end user performance from a work from home environment how can i get a better understanding of the user's machine versus a local network versus their isp versus the actual set of applications and what's really interesting in this case is you essentially have to understand end user performance where none of the the stack the the the the application or the network stack actually traverses the corporate boundary the only thing that maybe the it can id team can control is the actual end user machine itself and so what we what we we're seeing here is kind of the ability to capture data from what we call endpoint agents um that users typically deploy and our customers are deploying via group policy objects at bulk across all their home office employees and the way endpoint agent works is you know it runs um as a dual part application that runs sort of a host system application so for windows or mac it's running a package locally on the host os and it has a browser extension that's paired with it to be able to trigger network data and local network data collection and local system information to be able to to kind of provide that in a single single single view to understand end-user application performance and so we've made quite a bit of kind of changes to this application stack and this this product to be able to better serve home you office users now by default we don't collect any data because uh because of obviously privacy reasons the ig team goes in there and specify a list of business critical domains that are worth collecting data against that trigger network data collection there's also the ability to say kind of just do scheduled testing regardless of browser activity uh to say i'd always like get get i'd like to get data across every five minutes across the set of ten sites that i i know i need for my users and so in this case what i'm looking at is um kind of my my my employee base in this case that's accessing a set of applications that i care about and and i noticed sort of this dip and what we call the experience score that we capture um by looking at various things like page load times network latency local network performance and of kind of devices experience score that helps you understand overall performance and the the idea here is you know the the there's a user or a set of users complaining that they can't access an application and in this case that user and was you know acme um um in the innocent environment and acme was com was was complaining that their performance to work day was actually suffering and if we look at workday in this case you know we do see like page speed in general was has was worse at the particular time and so let's actually take a look at kind of a deeper information into this user's experience to workday um and what we realize you know one there's a wealth of information here about the end user itself around the fact that the user is you know running on a dell pc their their memory usage their cpu usage what browser they're using to access the application you can see they're actually kind of wired in this case and going across the gateway we actually also see that they're using anyconnect to be able to connect to workday in this case uh and and the public ip of the anyconnect vpn is 185.83 at 53 57 97.98 and more importantly what we also saw for workday at that time is sort of this increase in kind of you know page load time that coincided with that time that the user started complaining that there's a problem now if i filter down only to that user and workday to kind of get a better understanding of what happened i can actually look at the network to try to understand if that network latency was actually caused by some sort of a network change event or is it something application centric and if i look at this view i'm looking at specifically you know pawns on my user going to work day and i notice kind of if i go back and forth in time let's actually take a look at latency and you know prior to this network change event you know my my traffic from hanza's network was going across my local network gateway that went across actually my home office network across you know i speed my sonic that eventually took me seems like all the way through through aws that eventually was serving traffic for workday and essentially it was cloudfront and for whatever reason this user decided to be on vpn and was using the vpn in this case let's take a look um oh yeah let me um remove a particular network filter that i had was specifically using vpn at that time to be able to access this application and actually if i look at the underlay i can also understand the underlay vpn performance at this time and be able to see by using the public ip of the vpn gateway that we can understand is actually see underlay performance across the vpn provider and then back across the vpn which actually happened to be in the uk um to get to workday so this user that's in the san francisco bay area is actually vpning all the way to the uk and accessing a cloud for an instance somewhere looks like in amsterdam in this case because it's exiting the internet at that time and so we've seen you know a ton of value in being able to use the data that captured across work for home office users as well and vpn tends to be sort of this number one root cause of problems where we need to really quickly understand is there a vpn performance that's causing users heartache and can we use that information to be able to very quickly understand sort of where problems might be occurring it's kind of switching back to um the second announcement that we made and my colleagues will also talk in depth about what this integration is and give a demo here um is the the ability to actually provide thousand eyes data the network data and app dynamics data in a in a in a single pane of glass to be able to provide that common operating language that we talked about you know what we saw in in our early days and organically across our customers is we need the ability to understand user performance to the front door of the application and then on the back side from the application to third-party providers um that are typically being and third-party payment processors or gateways are that are being used and dependency that are being used to better serve our users um and we saw this opportunity to to actually use you know the data that thousands can capture and provide that data within app dynamics studio so that application and network teams can actually better serve their end users by trying to understand if the problem is actually on the application or a problem on the network and so the the integration that we worked on with across the application team dynamics team was the ability to actually pull in 1000i specific data across external services that applications rely upon into app dynamic dash studio where you can have let's say you know a business indicator that's showing you information about your payment health and then payment gateway that's reliant upon a set of external services and you see sort of this increase in payment processor back-end times and because you're pulling in test data live from within you know a thousand eyes you're able to kind of quickly be able to see that there is sort of this change in external service performance that you can then launch into using the launch button and and be able to verify you know where this network performance uh degradation is happening um this is sort of an you know an example of what you typically see within the the application flow mac that app dynamics typically captures that helps you understand the end-to-end workflow and there's this sort of a third party payment processor that was seeing this increase in you know http response times effectively which ended up being sort of this aws based reroute where traffic that was generally being served from ashburn is now going to singapore to be able to serve and that's causing sort of this increase in payment processor times that effectively is taking much longer uh for an end user to be able to process a payment causing errors and timeouts
Info
Channel: Cisco
Views: 1,740
Rating: undefined out of 5
Keywords: AppD, AppDynamics, Breana, Cisco Live, Integration, Prab, Switching, TFD, TFDCLVirtual, Tech Field Day, ThousandEyes
Id: u62PucYtD_s
Channel Id: undefined
Length: 37min 59sec (2279 seconds)
Published: Mon Apr 19 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.