Create Stunning Images with Your Voice using OpenAI's ChatGPT, Whisper & DALL-E!

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Goa Beach and here is the image created for Goa Beach hello everyone Apple ecosystem is a very famous ecosystem but guess what in the AIML World there is one ecosystem that is gaining a lot of popularity that is the open ai's ml ecosystem so in today's video I'll be using open AIS whisper API chat GPD API and Dali API to generate images from voice I'll walk you through the steps of generating images through open ai's powerful AI models with the introduction out of the way let's kick start the video let me start by installing some modules that I'll require throughout the entire process so I'll start by installing radio which will help me create some amazing uis I'll also install open AI to access open AIS model so I'll quickly go forward and install all the modules so what this particular example I'm using Google collab and I'm using the CPU version given that I don't require the high-end models to be deployed in the current session which is where CPU is sufficient for the task the installation is done now I'll quickly go forward and import the necessary modules so for this particular activity I'll require radio I require open Ai and I'll require Json so I'll quickly run this cell the installation is done the Imports are done the next thing that I require is basically the API key access so in order to access the amazing open AIS model you will require API key so you can find that on open ai's website itself you'll have to pay some money in terms of accessing chat GPT API whisper API the costs are very nominal so you can go forward and explore that just to keep my API key private I have created a simple Json file called as GPT underscore secret underscore key.json this contains my API key so I'll quickly load the API key into the variable open air dot API underscore key so let me run the two cells now we have the entire skeleton ready all we have to do is create different functions that kind of access different apis so let's start the first function that I'll take you all through is a chat GPD API function so I'll quickly unite the cell the chat GPT API function expects an input text in the selected line of code I've created a simple list called as messages the messages list is initialized with a default message from the system this message is just a string that says you are a helpful assistant if the input underscore text argument which is something that I pass through this particular function if that is not empty a new message is added to the messages list this new message is a prompt for Dali 2 Modi which will generate an image based on the input text The Prompt is generated using string formatting to insert the input underscore text argument into the message in the next line of code what you see is the chat completion object the chat completion object is created using the openai.chat completion.create method this method takes two arguments model which specifies the name of the gpt3 model that we want to use and messages which is the list of messages generated in Step 1 and step 2. the reply variable is set to the content of the first choice in the chat underscore completion object in this case there is only one choice so we just take the content of that particular choice finally the reply variable is returned from the function as the output of the chatbot so what will come in here is Whispers transcribe text this text passes through chat GPT the output of the chat GPT API will basically give you a Dali prom text the Dali text prompt which in our case is chat completion and finally which is reply will basically be passed through a Dali API function which will kind of accept that prompt and generate an image out of it so now let's move on to the Dali API function this particular code section defines a function called as Dali underscore API that takes in one argument which is Dali prompt which is a string of text used as an input for open ai's Dali API I'll go line by line here the function first calls the open AI api's image.create method with two arguments prompt which is set to the value of Dali underscore prompt and the size which is set to 512 cross 512 this method generates an image using dally based on the provided prompt and Returns the response as a Json object the function then extracts the URL of the generated image from the response by accessing the URL property of the first item in the data array of the Json object this URL is assigned to the image underscore URL variable finally the function Returns the image underscore URL variable as the output of the function which represents the URL of the generated image returned by the open AIS API so essentially what happens here is you take an input prompt and you generate an image and you return the URL of the generated image that is all that this function is basically doing Now we move on to the audio input section of the code which is where whisper underscore transcribe function comes in so this particular section of code defines a function called as whisper underscore transcribe that takes in one argument that is audio audio is a string representing the file part to the audio file that has to be transcribed the function first opens the audio file using the open method with the RB mode and assigns it to the variable audio file next the function calls the open AI api's audio or transcribe method with the two arguments whisper one which is the ID of the body that we have to use for transcription and the audio file which is binary file data of the audio file this method transcribes the audio file using the selected module and Returns the response as a Json object which is assigned to the transcript variable the function then calls the chart GPT underscore API function with the text value of the transcript object as an input this function generates a chat message prompt based on the transcribe text using open ai's GPT module and returns a string representation of the generated message prompt which is later assigned to the Dali underscore prompt variable finally the function calls the Dali underscore API function with the Dali underscore prompt variable as the input the function generates an image based on the provided Dali prompt and Returns the URL of the generated image which is assigned to the image underscore URL variable finally the function returns a tuple containing the actual transcribe text and the image URL that is generated from Dali's output so we've gone through the function I haven't imported any of the functions so I'll quickly import all of them in memory now we've reached the final stage of our output wherein we'll create an interface using radio so I Define two variables output underscore 1 and output underscore 2 the first output will correspond to the text generated by The Whisper model which is where I've given it a label speech to text the second output will correspond to the image generated by the Dali model so I'll quickly run the cell finally in this piece of code is where I Define The Graduate interface I call the function whisper underscore transcribe which will later call the chat GPT API as well as a Dali function I Define the set of inputs which in our case would be the audio that we kind of input through the microphone I go forward and Define the outputs and finally I give a title to the entire web interface which is generate images using Voice so what I'll do next is I'll quickly run the cell so I'll quickly launch this interface so the interface is up and running let me start recording something Goa Beach let's go forward and press submit so here it's able to identify the text so which is Goa Beach and here is the image created for Goa Beach let me try out something different now so I'll quickly clear this Gateway of India so here is the text that is generated which is Gateway of India and here is an amazing photograph of Gateway of India let me try out one more prompt I'm kind of enjoying the entire process so I'll quickly clear this so here it's able to create mountains and here it says uttarakhand so overall the entire ecosystem is working perfectly fine it's quick it's able to understand what I'm speaking using whisper and it's able to generate a corresponding image using Dali The Prompt for Dali is generated by chat GPT API so all the apis are working together to create this amazing solution isn't this amazing so this is all that I had in today's video I hope you enjoyed today's video if you do like the content that I create on data science machine learning open AI based Solutions then it would be really motivating if you can press the Subscribe button and also press the Bell icon to be notified for amazing videos on data science and machine learning thank you so much for watching the video [Music]

Info

Channel: Bhavesh Bhatt

Views: 1,911

Rating: undefined out of 5

Keywords: Bhavesh Bhatt, Data Scientist, Machine Learning, AI, Voice-assisted image generation, OpenAI ecosystem, ChatGPT, Whisper, DALL-E, AI-generated images, Graphics creation with AI, Image generation with voice, Artificial intelligence tools

Id: pPMh8SvOXvo

Channel Id: undefined

Length: 11min 6sec (666 seconds)

Published: Fri Mar 10 2023