Streaming Video and Audio over WiFi with the ESP32

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
"Oh! Have we got the video?" "Yes, we've got a video!" Yes, we have got a video. My version of the TinyTV is working. It's even got a remote control. So a big shout out to the channel Patreons, they've been getting a couple of sneak previews of progress along the way on this project. And once again thanks to PCBWay who manufactured the PCBs that I'm using for this project. They've been working flawlessly, but keep an eye out for a follow up video where I fix some of the mistakes that I've made. Now in the previous video I asked if you'd prefer to see streaming over Wi-Fi or streaming from an SD card. And the people's choice was streaming over Wi-Fi. And in a way I was kind of pleased about this. I don't have an SD card slot on my PCB and although we've seen in previous videos how easy it is to wire one up, it would have meant doing a lot of messing around with decoding video files. Streaming over Wi-Fi to me is a little bit easier. It gives me free reign over how to present the data to the ESP32. But before we get to the server, we've got some challenges to solve on the ESP32 side of things. The first question I had was how fast can we actually display an image on the display? We saw in previous videos using animated GIFs that the screen is pretty fast. But just how fast is it? So I've created a full screen uncompressed image and hard coded it into the sketch. Let's see how quickly we can push it to the screen. I'm using the very excellent TFT_eSPI library for this test. So that's pretty impressive. We can push 280 by 240 16-bit pixels in about 17 milliseconds. That would give us a theoretical frame rate of almost 60 frames per second. But there are some issues with this. I'm planning on streaming the frames over Wi-Fi and a single uncompressed frame is over 130K. That is pretty big. It might be doable over a very fast Wi-Fi network, but to get something that works we're going to have to make it much smaller. A really popular thing to do is to use something called Motion JPEG or MJPEG. This is just a stream of JPEG images so it's really simple. The question now is how quickly can we decode and draw a JPEG? My initial attempts were not particularly promising. I used the sample code from the TFT_eSPI library and I'm pretty sure we can actually see the image being drawn. It's taking around 180 milliseconds to draw an image. This gives us a maximum frame rate of about 8 to 9 frames per second. Not very impressive. Fortunately there is another JPEG decoder library called JPEGDEC. This has some impressive performance claims. Any one who needs to use microseconds to measure time is obviously doing something right. If we use this library we're now down to around 49 milliseconds to draw each frame. That's a much more respectable 20 frames per second. That should give us time to download a frame, display it and still get a decent frame rate. We can go even faster by enabling DMA. This gets us down to just 36 milliseconds, which gives us 28 frames per second. The DMA solution works so well because we can hand off the pixels to the DMA controller and decode the next section of the image while the pixels are being sent to the display. We can get even more clever. We've got two cores on the ESP32. We can use one core to download the image and another core to decode and display it. This way, while we're drawing a frame, we can be downloading the next frame to display. That's pretty cool. The target I had in my head for FPS was around 15 frames per second and this definitely seems achievable. The limiting factor will probably be my crappy Wi-Fi rather than any technical blockers. So how are we going to stream the frames from the server? I don't want to make anything too complex as I want to keep the ESP32 code as simple as possible. The easiest thing that I can think of is to have an HTTP server that will give us a JPEG image from the video being played. I'm not trying to do live streaming so we can just have the ESP32 request a frame for a particular timestamp. This makes things very simple as we don't need any complex synchronisation between the server and the client. The client can just consume and display frames at its own pace. And this works really well. You can see the frames being requested by the client. We do get the occasional long display due to my Wi-Fi but it works. On the server side, I'm pre-processing the videos to extract all the frames. This makes the server very fast as all the hard work has already been done and it just serves the JPEG data at the requested timestamp without needing to do any work. So it looks like we've got the vision part of the system solved. Now we just need sound. I don't want to deal with any complex codecs on the ESP32 so I'm just going to use 8-bit PCM data at 16kHz. This should still give us reasonable quality and it won't be a lot of data to transfer. We can pull this audio data directly from the server. We'll download a chunk of audio and send it to the I2S amplifier. This has its own internal DMA buffers. Once these have space for more data, we can fetch the next chunk of audio data. This will give us a continuously playing piece of audio without any caps. It also works pretty well. I'm pulling down one second of audio at a time and I've got 4K of space in the DMA buffers, which gives us around 256 milliseconds to download the next chunk of data. If we needed to, we could increase the number of DMA buffers to allow for poor network conditions and give us more time to download data. But this seems to be working okay on my network. There's one potentially tricky challenge still remaining. We have our audio stream. This will play audio sequentially and the streaming rate is controlled by the I2S sample rate. And we have our video frame stream. The rate of this is controlled by the elapsed time and it just draws frames as quickly as it can download and display them. There's a real danger of these two independent streams getting out of sync. If our audio stream has some network issue and gets delayed, our images won't match up, which would be really irritating. There's a fairly obvious solution to this. We just use the audio stream to calculate elapsed time. Every time we need to push data out to the I2S peripheral, we know how much time has passed, so we can use this to keep our images in sync. It's a nice, elegant, simple solution. It won't work for live streams, but it will work really nicely for our use case. With our syncing solved, we've got a fully functional video streaming system. There's only one thing really lacking. We don't have any controls. We can't change the channel or the volume. It's a TV, so the obvious thing to do is to add a remote control. I've had a couple of these infrared receivers for ages and it's finally time to put them to use. They are pretty easy to wire up. They've only got three pins, out, ground and VCC. I've stuck one in a breadboard and hooked it up to my oscilloscope. It's pretty interesting to see the patterns for the different buttons. We have power, volume up, volume down, channel up and channel down. That should do us for our TV control. I've soldered three legs directly to GPIO pins. I definitely need to add something to the next revision of the board to do this properly. Looking at the datasheet, we are supposed to have an RC filter on the power supply and a pull up resistor, but it seems to be working reasonably well without them. We do get a few spurious commands, but it seems OK. I've hooked up the iRemote Arduino library and we're getting commands being detected. It's working! We can now control our little TV. When I hit the power button, it asks the server for the list of channels. It also asks how long each channel is. At the moment I'm just looping around each video, but you could get clever here and move on to the next channel at the end automatically. The volume buttons work nicely and the channel up and down buttons just move through each channel and reset the play position back to zero. I've also added a nice little animation to show static when changing the channel. This was pretty interesting to implement. My initial naive approach was to fill a buffer with random grayscale values using the random command, but this turned out to be painfully slow. But we don't really need a proper random number, we can just generate one using a pretty rubbish pseudo-random number generator. This works really quickly and we get a nice little static effect. There's nothing really specific about my custom board that makes this work. It should work on any ESP32 and SPI display. You'll just need to add an amplifier to get the sound output. I've tried it out on the cheap yellow display that Brian Lough has been talking about, and it does work. Unfortunately there does seem to be a bug around using DAC for audio output at the moment, which needs to be investigated. But the streaming does work. It might be interesting to get it controlled via the touchscreen, but that's probably for another video. If you want to try it out for yourself, there's a couple of places you need to modify. Almost all the settings for the firmware are set up in the platformio.ini file. The only thing you'll need to modify in the code is the wi-fi credentials and the IP address of the server. On the server side of things, you just drop any videos you want to play in the movies folder. Keep them fairly short because as I said earlier, it does pre-process them to extract all the jpeg images. You'll need to change the jpeg size in the code to match the size of your display. [The code is pretty rough and ready, and has been hacked together pretty quickly, so use it at your own risk. But as always, have fun and keep making stuff!]
Info
Channel: atomic14
Views: 17,383
Rating: undefined out of 5
Keywords: ESP32, Arduino, Video
Id: G6MROvlLeKE
Channel Id: undefined
Length: 8min 54sec (534 seconds)
Published: Tue Sep 19 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.