Hey guys. What's up?
Right now my voice is being transferred through the INMP441
mems microphone on the ESP32-S3. How is the quality? It currently
has a sampling rate of 16000 Hz. It's used in most VoIP. So
I believe it's not that bad. Of course, you can increase the
sampling rate up to 44100 Hz but this will have a lot of delay
while sending data to the server. The goal of this project is to be
able to play the audio source from the ESP32 to all connected clients in real-time. It will be fun as you can connect it
from any device with a web browser. Let's get into it. As you can see here, the hardware
configuration is very simple. I simply connected the INMP441
to the ESP32-S3 DevKit-C. Its role is to continuously transfer audio
sampling data to the connected server. That's it. First, it's the source code for ESP32.
You can download this project from my GitHub page. If readability is poor, please
look at the code yourself. The basic program flow is very simple. Connecting WiFi and try to
connect the WebSocket Server, if the connectivity is ready, audio sampling data is transferred to the server.
That's what this program is all about. The point of this program is the settings of I2s.
Let's take a quick look at i2s sampling. In order to use i2s, we need
to have the correct settings. The sampling rate determines sound quality. It means the number of audio samples
we can receive in one second. The bits per sample is the number of bits used
in the sample. More bits equal better quality. So the higher the resolution and sample
rate, the larger the size of the data file. DMA buffer settings are important. By using DMA, peripheral devices can
directly access memory without using the CPU. After getting the sampling data from the
DMA buffer, we can actually use this data. Now we need to think about DMA buffer size. A buffer that can hold more samples
can do less work for the CPU. This is because the number of times the CPU is
interrupted depends on the amount of sample. The sample size of 128 is updating
the screen faster than that of 1024. The CPU works again by the interrupt,
and the screen is updated at this time. The sample size of 1024 updates is much
slower, meaning the CPU can do less work. So the CPU can do more other things.
Better to have a larger buffer length. Finally, it is setting the count of the buffer.
I currently set it to 10. Since this DMA buffer contains samples, the size
of the actual buffer is 20 Kilobytes in my case. Because there are cases where
the buffer cannot be emptied, Making the DMA buffer as large as
possible can cover the worst cases. But this DMA buffer is assigned to SRAM, and in
the case of ESP32-S3, the size of SRAM is 512 KB. So, the size of the DMA buffer
must be carefully controlled. This is a transmission test
from ESP32 to the server. In a 44.1kHz, 16-bit, Mono, the
amount of data that needs to be transferred to the server in
one second is 88 Kilobytes. If this is not met there will be
problems playing audio on other clients. It's currently receiving 86 sample data of
length 1024 from the server in 1 second. This isn't bad. However, this is
assuming there is no network delay. WebSockets also work over TCP. It has a
higher network traffic load compared to UDP. Unfortunately, UDP-based WebSockets are not
supported, so if you need a better system, I recommend creating a WebRTC-based service. There is WebRTC's RTCDataChannel
API for UDP-like communication. This system is server-centric.
All clients connect to the server, receive the audio data sent by the ESP32 to
the server, and play it in each web browser. Sampling data obtained from the INMP441
can be directly played using a PCM player. The reason why WebSocket is used even
though it is based on TCP is that it makes it very simple to create a system that
sends and receives data from client to client. If you don't like this, try writing a server
based on UDP or WebRTC as I mentioned before. Also, I have a plan for that too.
Let's see how I can build it. This is a server code based on node js.
This server is simply responsible for sending data to connected WebSocket clients. On the right is the audio_client HTML file
you can connect to it from your web browser. The server prepares for HTTP
and WebSocket connections. While the server is running, enter the IP address, port number and audio path in the web
browser to open audio_client.html.
What you do here is to update the
IP address of the WebSocket server. Since the current server runs locally,
it will be a local IP address. You can easily get the IP address
with the ifconfig command on Mac and the ipconfig command on Windows. This PCM Player plays audio
samples obtained from INMP441. Since Bits per sample is
16 bits, it is Integer 16. It's mono, so it has 1 channel.
The sample rate is 44100. After setup, audio data can be output to the
speaker through the player's feed function. As you can see here, Plotly
is used to draw audio graphs. It is recommended to remove this
if your system is slowing down. You can download this source
code from my GitHub page. Please give it a try and let me
know if you have any problems. After downloading the source code, install the package with NPM.
Then start the server. That's it server is running now. Now connect power to the ESP32 or reset it
so that it can connect to the server. Enter the address and connect to the server.
This is the first client. You can connect it in the
same way from other devices. It seems difficult to continuously stream 44100Hz audio sampling in a TCP
environment without interruption. It's because network latency can always occur.
If possible, please change it to 16000Hz in my code and test it. You can
see that it works much smoother. Usually, in my case, I do a lot of projects
to create a central server, receive data from multiple devices, and analyze and process it. That's why I shared a project
that can do this kind of test.
It's not that easy to continuously transmit streaming data over
the network. Many things must be considered. What I've done is a minimal version. I hope
this will be a stepping stone for your project. That's it for today.
Thank you for watching. See you on the next project.