How streaming impacts React and Next.js applications

Streaming is the key to fast and dynamic web  applications. When you look at the largest sites   on the web, like Amazon or Google, they all use  streaming to progressively send these chunks of   UI from the server to the client and personalize  the page based on me, the visitor, and what I want   to see. The faster I can stream back something  to my customers, the faster they can click that   "add to cart" button. So, this is incredibly  critical, and I want to walk through this blog   post that I recently wrote, which has some notes  about how streaming helps improve web application   performance. Not only does this impact conversion  rates—that "add to cart" button is above the fold   and can be clicked on instantly—but also, Google  cares about performance as well, as measured by   your Core Web Vitals. We have another video  on our channel if you want to learn all about   Core Web Vitals, but the short story is: faster  sites are going to rank higher in comparison to   your competitors. So, you want to make sure your  site's fast, and streaming is a huge unlock there. Streaming can prevent slow data sources from  blocking that initial display of your UI. So,   if you know there's something below the fold  that's going to be way slower, streaming   can unlock making sure that the initial UI can  still come in very quickly. And critical assets,   like JavaScript or stylesheets—or really anything  else—can also be parallelized while loading,   again trying to help you get that  fast initial page load performance. Now, the other thing that ties into streaming  that's directly related is collocating that   compute that you're using to talk to your  database, whether it's to the database directly,   to an API, to your headless ecommerce  solution. You want that to live directly   next to wherever that API or that database is  located, so you want to prevent these network   waterfalls between your traffic because that's  going to make that initial UI take even longer   to actually get something back on the page.  You can't beat the speed of light, so we want   to run that close. Typically, when I talk to  customers, they usually have one, maybe two,   maybe three different data regions where they're  actively replicating data. It's usually one;   it's usually US East. So, that's why we default  our functions to US East, and all of the Vercel   functions can also stream responses. So,  you keep your data close to your compute,   and then with having customers all around the  world, you can get back quick responses from   our edge network to optimize that time to first  byte. And then, you can stream back in that data. Now, we're also working to take this even further  in Next.js with partial prerendering. I'll put a   link down below if you want to learn more about  partial prerendering, but the idea basically   is that the shell of your application that you  compute based on your code can be taken and placed   in all of the edge network regions. And then, we  utilize streaming with the Next.js app router to   still have that co-located compute with your data  and reduce the latency while still getting some   initial UI quickly on the screen. So, definitely  go check out partial prerendering. It's not stable   yet, but we're cooking up something there, and  streaming is the default with the Next.js App   Router. So, by using the App Router, you don't  have to configure anything to get streaming set   up, and this works by default on Vercel as well.  And one of the cool things is, your streaming is   defined by your Suspense boundaries. So, if I want  to stream things in order, which is the example we   see with things like Amazon where you see it kind  of go down the page and the chunks load in order,   you can absolutely do that too. But a  big unlock with this architecture is,   kind of at the component level, I can decide  to choose to stream things out of order. So,   we've got these two code snippets here that show  how we can improve the UX here by determining   what we want to actually block and progressively  reveal more content, or we can say, you know what,   we're going to load the price and the reviews  out of order by wrapping these with Suspense and   having some skeleton or shimmer effect while we're  waiting for those to load in. So, it's cool that   you get the composability and the granularity  to define that using React Suspense. Super   excited about that. If you want to see a demo  of this, we have one on We have this little embedded video as well  here to show, for example you can choose   exactly where you want those bits to stream in.  Now, another really really important part of   streaming is the application with LLMs.  A lot of these LLMs take sometimes 10,   30 seconds to produce a response. Especially  when doing heavier things like image generation,   you want to make sure to have that low latency to  have that good user experience. You prevent having   some blocking UI; you want to progressively send  back information as it's ready. And if you look   at the most popular chat apps today for AI, they  all have that streaming type of interface, from   ChatGPT to really the rest of the other players  in the space. So, the app router and Next.js,   and Vercel's infrastructure also set you up to  build this next generation of AI applications   that need to utilize streaming. And we built this  AI SDK that makes it super easy to abstract away   all that boilerplate code to start streaming.  So, if you want to learn more about any of this,   got some links down in the description. We've  also got the video on partial prerendering,   and we can definitely do more content about how  streaming impacts your business but also impacts   the performance of your application. Let us know  in the comments what you'd like to see. Peace!
Published: Fri Feb 16 2024
