Streaming is the key to fast and dynamic web
applications. When you look at the largest sites on the web, like Amazon or Google, they all use
streaming to progressively send these chunks of UI from the server to the client and personalize
the page based on me, the visitor, and what I want to see. The faster I can stream back something
to my customers, the faster they can click that "add to cart" button. So, this is incredibly
critical, and I want to walk through this blog post that I recently wrote, which has some notes
about how streaming helps improve web application performance. Not only does this impact conversion
rates—that "add to cart" button is above the fold and can be clicked on instantly—but also, Google
cares about performance as well, as measured by your Core Web Vitals. We have another video
on our channel if you want to learn all about Core Web Vitals, but the short story is: faster
sites are going to rank higher in comparison to your competitors. So, you want to make sure your
site's fast, and streaming is a huge unlock there. Streaming can prevent slow data sources from
blocking that initial display of your UI. So, if you know there's something below the fold
that's going to be way slower, streaming can unlock making sure that the initial UI can
still come in very quickly. And critical assets, like JavaScript or stylesheets—or really anything
else—can also be parallelized while loading, again trying to help you get that
fast initial page load performance. Now, the other thing that ties into streaming
that's directly related is collocating that compute that you're using to talk to your
database, whether it's to the database directly, to an API, to your headless ecommerce
solution. You want that to live directly next to wherever that API or that database is
located, so you want to prevent these network waterfalls between your traffic because that's
going to make that initial UI take even longer to actually get something back on the page.
You can't beat the speed of light, so we want to run that close. Typically, when I talk to
customers, they usually have one, maybe two, maybe three different data regions where they're
actively replicating data. It's usually one; it's usually US East. So, that's why we default
our functions to US East, and all of the Vercel functions can also stream responses. So,
you keep your data close to your compute, and then with having customers all around the
world, you can get back quick responses from our edge network to optimize that time to first
byte. And then, you can stream back in that data. Now, we're also working to take this even further
in Next.js with partial prerendering. I'll put a link down below if you want to learn more about
partial prerendering, but the idea basically is that the shell of your application that you
compute based on your code can be taken and placed in all of the edge network regions. And then, we
utilize streaming with the Next.js app router to still have that co-located compute with your data
and reduce the latency while still getting some initial UI quickly on the screen. So, definitely
go check out partial prerendering. It's not stable yet, but we're cooking up something there, and
streaming is the default with the Next.js App Router. So, by using the App Router, you don't
have to configure anything to get streaming set up, and this works by default on Vercel as well.
And one of the cool things is, your streaming is defined by your Suspense boundaries. So, if I want
to stream things in order, which is the example we see with things like Amazon where you see it kind
of go down the page and the chunks load in order, you can absolutely do that too. But a
big unlock with this architecture is, kind of at the component level, I can decide
to choose to stream things out of order. So, we've got these two code snippets here that show
how we can improve the UX here by determining what we want to actually block and progressively
reveal more content, or we can say, you know what, we're going to load the price and the reviews
out of order by wrapping these with Suspense and having some skeleton or shimmer effect while we're
waiting for those to load in. So, it's cool that you get the composability and the granularity
to define that using React Suspense. Super excited about that. If you want to see a demo
of this, we have one on partialprerendering.com. We have this little embedded video as well
here to show, for example you can choose exactly where you want those bits to stream in.
Now, another really really important part of streaming is the application with LLMs.
A lot of these LLMs take sometimes 10, 30 seconds to produce a response. Especially
when doing heavier things like image generation, you want to make sure to have that low latency to
have that good user experience. You prevent having some blocking UI; you want to progressively send
back information as it's ready. And if you look at the most popular chat apps today for AI, they
all have that streaming type of interface, from ChatGPT to really the rest of the other players
in the space. So, the app router and Next.js, and Vercel's infrastructure also set you up to
build this next generation of AI applications that need to utilize streaming. And we built this
AI SDK that makes it super easy to abstract away all that boilerplate code to start streaming.
So, if you want to learn more about any of this, got some links down in the description. We've
also got the video on partial prerendering, and we can definitely do more content about how
streaming impacts your business but also impacts the performance of your application. Let us know
in the comments what you'd like to see. Peace!