Speech to Text to Document AI in Power Platform | Whisper AI & GPT with Azure OpenAI

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone in this video I will show you how we can Leverage The Power of AI and the Microsoft Power Platform to convert speech to a document we will leverage open ai's whisper API to convert speech to text and then take advantage of the new Azure open AI GPT action to create content for the document based upon the speech to text conversion so let's go ahead and check this out in action [Music] open AI announced the whisper model performs speech to text transcriptions and translations the speech to text API provides two specific endpoints that we can take advantage of to leverage the speech to text capabilities in powerapps we will create a custom connector I'll create a new custom connector from blank I'll give my connector a name I can upload an icon for this connector The Host will be API Dot openai.com security API key parameter name authorization the definition we'll create a new action the operation ID I'll call it speech to text request I will import from sample it's a post request to the URL V1 audio transcriptions headers content type is multi-part form data I'll click import if I head over to Swagger editor I need to now insert the parameters related to form data and I will also Define that it consumes multi-part form data that completes my updates in the Swagger definition I'll turn this off for the response I'll click on default click on import from sample and the response comes back in Json format which would include an object with the property text I'll click import and I will go ahead and create the connector once my custom connector is created I can start leveraging this in powerapps I'll head over to create and create a new blank canvas app I'll give my app a name and click create in this app I will head over to data go to add data and search for my custom connector I'll select my custom connector I will need to insert my API key the key would be in the format Bearer space the API key and I'll click connect once I have my connection established I will insert the microphone control this allows me to record audio and I will add a button on select of this button I will call the API through my whisper custom connector so whisper Dot speech to text here it needs the file reference that is the audio property from the microphone control that I added so I will copy the name of the control point to that control Dot audio comma the name of the file I'll give it a name audio Dot the audio that's generated by the microphone control is in wbmp format comma the model which is whisper hyphen one and finally I need to provide the content type which is multi-pot slash form hyphen data I'll close the curly brace and close the function this will go ahead and call the whisper API and it will provide me a response that I will store in a variable and to showcase the data from the response I will add a label control the text property for this label control I will leverage the variable dot the property text Let's test this out hello my name is Raza durani The Whisper API converts speech to text I will take advantage of a new model in AI Builder called Azure open AI service this can create text answer questions summarize documents and more with GPT it also comes with a standard set of templates in my scenario I will try and create a document so I'll use the create blog post template I will create a new flow I'll create the flow from blank I'll delete the Trigger action for the flow pick power apps and pick the new powerapps V2 Trigger action I will provide two pieces of input to this trigger the first of type text I'll add a second input of type email I'll add a new step pick AI Builder and here is the new create text with GPT on Azure open AI service action I'll select this and leverage create instructions in this scenario I'll pick create a blog post and I'll click use instructions in flow now this has a set of instructions that comes pre-baked as part of the template and this is something that we can change I'll update the instructions as follows try to create a blog post on the topic below the blog post should be less than one page and it must be in HTML format with HTML table and inline styling as applicable and this is where I would like to provide the dynamic content input about the blog post that the Azure open AI service gbd action should generate I will pick from Dynamic content from my Trigger action the input next I will create a file in OneDrive I'll create this at the root I can give my file a name I'll call this GPT blog post dot HTML the file content would be the Dynamic Property text from Azure open AI service I would like to convert this file the final reference would be from Dynamic content property ID from the create file action and I would like to convert this to a PDF that I would like to send as an email attachment to the user who is calling the flow and I have the email input property subject I'll call this blog post from GPT the body please find attached PDF document for topic and here I will put the dynamic content input from my Trigger action to attach the document I'll go to show Advanced options attachment name would be file name attachment content would be file content I'll give this flow a name and click save I'll close the flow this will add the flow Association to my PowerApp my button control I have changed the text of the button now to speech to document currently we are calling the whisper API to convert speech to text The Next Step I will call the flow which is GPT generate document and call the run method of the flow the first property is the text that's the instruction input of the blog post that I would like to generate I leverage that response dot text that's the output of speech to text API comma the email would be user dot email and right after this I will leverage the node 5 function to notify the user that the document generation request has been sent notification type success and I will show this for three seconds let's try this out I'll click preview best practices in building canvas powerapps I'll click speech to document here is the email that I have received blog post from GPT based on the instruction provided and here is the attached document your instructions can be dynamic the type of content that it can generate can be dynamic list of countries and capitals of the world and here is the response I can also attach an audio file in this case I have a short recording of a conversation that takes place in a meeting and the instructions here I have modified it to generate talking points based on the response that we will get from the speech to text API that will leverage the audio file I click generate talking points here's the speech to text response it's a conversation between two users about a project and here's the response which is a document that has the action points from the audio file here I'm leveraging speech to text and the Dali model to generate an image a cow driving a car speech gets converted to text gets converted to image pigs offline if you enjoyed this video then do like comment and subscribe to my YouTube channel and thank you so much for watching
Info
Channel: Reza Dorrani
Views: 26,168
Rating: undefined out of 5
Keywords: speech to text, speech to text app, openai, whisper ai, whisper text to speech, whisper api, power platform, power platform tutorial, ai, artificial intelligence, microsoft, azure openai service, azure openai tutorial, create text with gpt, how to transcribe audio to text, transcribe, transcribe audio, transcribe audio to text, how to convert audio to text, voice to text, mp3 to text, reza dorrani, how to, voice, audio, powerapps, power apps, power automate, gpt, whisper, tutorial
Id: LBYirTVBEe4
Channel Id: undefined
Length: 12min 30sec (750 seconds)
Published: Mon Mar 27 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.