Introducing the Azure SDK for Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] the azure sdk for python is a collection of libraries that enables you to quickly easily and consistently integrate azure services into your cloud solutions [Music] you authenticate your users and applications with the azure identity library you store your application data with the azure storage libraries you integrate artificial intelligence into your apps with the cognitive service libraries you implement robust messaging systems with the azure messaging libraries we have libraries to meet all your cloud solution needs and they can be found at azure.com sdk as you use the new azure sdk for python libraries you'll notice that they feel just like any other python library we put a lot of effort into ensuring that we follow python design guidelines conventions and best practices we also integrate with the python ecosystem and allow you to consume packages from either pi pi or conda today we're going to discuss a common healthcare industry scenario we have thousands of scanned invoices and health records that we want to extract the text from convert to json and store in persistent storage such as the azure blob storage service this example will use our new identity storage messaging and cognitive service libraries we'll divide the application into two parts the pdf ingester and the pdf processor that way we'll be able to decouple those two responsibilities and scale them independently the azure messaging service is the glue that will bind those application components together first we'll write a program that uploads the pdf files to blob storage to be later processed while uploading the documents we'll send custom telemetry events to event hubs which is a service that is commonly used in massive scale telemetry systems a new blob created event will be triggered when a new pdf is uploaded we'll configure blob storage to forward those events to event grid which is an azure native eventing system where you can listen for azure events or define your own events event grid will send a message to service bus using the official cloud native computing foundation cloud events schema our pdf processor will receive the service bus message and then call the new cognitive service apis to extract invoice and patient information from the pdf files it will transform the data into the format needed by later steps in the application serialize it to json and save it to blob storage let's have a quick look at the type of documents we'll be processing this is an example of an invoice that can be processed by the new cognitive services invoices api as you will see in a bit the api is capable of extracting all this data into an object model including the details of each invoice line item on the left you can see the original pdf invoice and on the right you can see the resulting json model that would be stored in azure blob storage the other type of pdf document that will process is a scan medical record the cognitive services health apis can extract all the medical conditions that are described in the doctor's notes on the left you can see the original doctor's notes in pdf form and on the right you can see the extracted data in json format needed for our application let's have a look at our pdf ingestion code which reads our pdf files from disk uploads them to blob storage and writes events to event hub for this part of our solution we'll use azure sdk packages from conda let's open up conda and install the microsoft source and azure packages i've opened the anaconda navigator and have selected the conda environment that i previously created in order to add the azure packages i first need to add the microsoft source so click on channels then click add enter microsoft and hit enter then click update channels now select not installed and search for azure dash we need to install the azure storage and azure event packages so select both of those and click apply it will then show you dependent packages that will be installed click apply once that completes select installed to verify that all the correct packages were installed you'll notice that azure identity and azure core were automatically pulled in as dependent packages now that our content environment is set up let's switch over to vs code and run the app i'm now in vs code and the app is loaded you'll notice that i have squiggly lines under the azure imports that's because i have not selected the python virtual environment yet so let's go ahead and do that i'm going to click down here in the toolbar and i'm going to select the pycon virtual environment we successfully created the conduct virtual environment added the azure packages have loaded the virtual environment into vs code now let's run the app you'll notice here that we are using default azure cadential blob service client event hub producer client and event data classes from the azure libraries then we new up an azure identity default azure credential object which is a convenient wrapper around all the complex oauth workflows azure identity has many authentication methods available to you when run locally default azure credential will use your dev credentials such as credentials from azure cli or powershell when deployed to production it will use manage identity or service principal information you have configured via well-known environment variables we then knew up a blob service client passed the endpoint in the credential object we created earlier the client will use that credential object to request a token from azure active directory before performing actions that require authentication we then get the container clients for both invoices and records and new up an eventhub producer client event hub sends messages in batches so we'll create a batch then we will iterate through each of the invoice pdfs and the record pdfs and upload them to azure blob storage after the documents have been uploaded we'll call the send batch method of the event hub client to send all those events to the event hub when the pdf is uploaded event grid will automatically send a message to the service bus queue in the cloud events format we actually don't need any code for this step all you need to do is configure the event grid system topic to watch for blob created events and set the destination to the service plus queue you can create this configuration via the portal cli powershell arm bicep or any other infrastructure as code method let's now dive into the pdf file processors we have one processor for invoices and one for patient records both processors will look for service bus messages and use the cognitive services apis to extract the text convert to json and then save to blob storage for the processors we're going to use a standard python virtual environment and install our packages with pip and a requirements.txt file we're back in vs code and we have the invoice processor code open we'll new up all the necessary azure sdk clients including the form recognizer client which is the cognitive service that allows you to extract text from documents we'll new up a service bus client to monitor messages sent from event grid to service bus and by iterating over the service bus receiver will automatically receive new messages for each new message we're going to deserialize it as a cloud event and then we're going to pass the url of the pdf stored in blob storage to the begin recognize invoices from url function of form recognizer client this is going to use the azure sdk long polling operation design that allows you to make a method call and then it will automatically pull the results once we have the results we're going to new up a custom invoice object and then we're going to upload that and then we're going to complete the message you can use azure storage explorer to view the files that have been uploaded to blob storage here's the original pdf and here is the json version of that pdf file we can double click those to save them locally and here are the files side by side on the left you have the original pdf file on the right you have the json representation of that file which was extracted using the cognitive services invoice apis now let's take a look at the patient record processor we're going to use the form recognizer client again to extract the text then we're going to use the text analytics client to extract the health data from that text you'll notice that each of these clients are using the default azure credential you also have the option of passing a key to the client instead of a credential object in this case we're going to use azure key vault to store that key and then pass it to the text analytics client once again we are looping through all the messages sent to the queue deserializing into a cloud event and then passing the pdf url to the begin recognize content from url method we will poll for the results and then we will deserialize those results into a custom record object we'll then upload the result to blob storage we'll then use the begin analyze healthcare entities method to extract the well-known healthcare terms from the patient notes we'll create a patient object from those results upload them to blob storage and then complete the message back in azure storage explorer we can see those documents here's the original pdf the raw notes from the record and the patient data in the schema we need for our application once again you can double click on these to open them up on the left you have the original pdf health record form on the upper right you have the raw notes extracted from the text and on the lower right you have all the terms that were extracted using the healthcare apis in this video we gave you a brief introduction to the azure sdk for python we learned how to use azure identity azure storage azure messaging and azure cognitive services to extract invoice and patient data from pdf files and store the resulting json in blob storage you can find links to all the packages docs and code at azure.com sdk follow us on twitter at twitter.com azure sdk to get notified of all the new sdk features and releases subscribe to our blog at aka dot ms slash azsdk blog for a behind the scenes look at how the azure sdks are built thanks for watching and have a great day
Info
Channel: Microsoft Developer
Views: 23,233
Rating: undefined out of 5
Keywords: python, azure sdk, azure
Id: 4xoJLCFP4_4
Channel Id: undefined
Length: 10min 29sec (629 seconds)
Published: Tue May 25 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.