The Crazy Computations Inside Your Smartphone Cameras

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

amazing technology. Innovative and thats what technology is all about

👍︎︎ 2 👤︎︎ u/carisol71 📅︎︎ Jan 31 2022 🗫︎ replies

Just the fact that high-end phone cameras can take pics that look almost as good as pics taken on much bigger and more expensive pro cameras just blows my mind.

👍︎︎ 2 👤︎︎ u/Portgas 📅︎︎ Jan 31 2022 🗫︎ replies
Captions
i recently got an iphone 13 pro i love it for me the thing that sticks out most literally is the camera for a long time iphones have had increasingly better cameras the last few generations camera performance has headlined the marketing message get the new iphone because the camera is a lot better it is one of the few things that gets people to upgrade and indeed the 13 pro's camera takes really good pictures but is not just the iphone virtually all of today's top smartphones can take images on par with anything you can get with a standalone the top smartphone makers are investing a lot into the space back in 2015 60 minutes reported that apple employed 800 people to work just on the iphone's camera in 2021 xiaomi announced that they're hiring thousands of employees to work on their phone's cameras how does your mobile phone camera work how did it get to be so good in this video we'll look at the amazing computer and semiconductor engineering that goes inside this impressive feature but first i want to talk about the asian armatry podcast well it's more like an audio feed i've been told that my videos would make for good listening so here it is i don't think the podcast will be as funny as the videos since most of my jokes are visual but now you can listen to me ramble while you're headed to work or taking a walk my mom says that my voice can help put people to sleep you can subscribe and listen to the asian artistry podcast on apple spotify and more alright on with the show traditional analog cameras work by exposing photographic film to a light pretty simple even though the outcome is still the same a picture today's smartphone cameras work far differently than that it is convergent evolution but for imaging your basic digital camera module has a simple physical structure a lens placed on top of an image sensor chip module makers and integrators often may add other things like sensors and actuators to help with camera performance but that's the core structure this digital camera structure is shared across all digital devices however the smartphone setting offers certain advantages and disadvantages compared to larger digital photography equipment like dslrs the biggest downside is that the camera's insides have to be a lot smaller and thinner smartphone camera modules are typically housed in areas seven to ten millimeters thick this is because the smartphone itself needs to be portable the device has to fit in a pocket thus modern smartphone sensors are typically about five millimeters by four millimeters in size in comparison dslr image sensors are much larger full size sensors can be 36 millimeters by 24 millimeters big you want the image sensor to be larger so that as much light as possible falls onto a pixel smaller sensors risk more motion blur image noise and reduced dynamic range smartphone sizes also limit smartphone optics as well they are a lot smaller and less adjustable than what you can find in a typical dslr this creates a couple major engineering challenges that camera manufacturers have to overcome the first is a fixed and limited aperture aperture refers to the opening of a lens diaphragm it is where light passes on route to the sensor smaller aperture means less light falling on the sensor and as i already said earlier you want more light on the sensor the second is a limited zoom function in the camera industry zoom refers to the ability to smoothly shift from a long shot to a close-up later on the video we will talk more about how today's smartphones are able to achieve a pretty good zoom on the flip side however the smartphone is capable of offering much more computing power than your discrete camera today's smartphone chips are as powerful as some laptops and that has been one of the key factors in improving the physical deficits of a smaller sensor the science behind silicon photo detectors date back to the 1920s however it would be william boyle and george smith at bell labs who first proposed the idea of charge-coupled devices or ccds in 1969. a year later the idea became a technical reality boyle and smith would win the nobel prize in physics in 2009 for their work ccds capture the majority share of the early mobile imaging sensor market however a new type of photo sensing semiconductor complementary metal oxide semiconductors or cmos soon emerge with a lot more potential over time cmos sensors began to replace ccds in consumer and professional devices they consume less power can be produced by conventional semiconductor foundries like tsmc and are not as expensive their inclusion in smartphones solidified their dominance in the imaging industry with how important they are becoming to the overall device smartphone cameras and their sensors are a big business the imaging chip business is a large one estimated to be about 20 billion dollars in 2020 it is dominated by three companies korea samsung japan sony and china's omnivision yet due to fundamental physical constraints image sensor technology can only go so far at some point the camera bump gets too ridiculous luckily more powerful computing devices are coming to the rescue computational photography is a fast growing field and where some of the industry's biggest advancements in recent years have been so you see a cute cat and you want to take a photo with your iphone what happens when you hit the button on the camera app so let us spend some time walking through the image processing pipeline i'm not going to go through all the steps but try to catch the major ones i also want to note that every smartphone manufacturer has their own unique image processing pipeline even when using the same hardware smartphone companies can modify the parameters and algorithms they might do this per their own testing or to differentiate their camera performance from other manufacturers alright let's get started the camera image sensor is made up of a 2d grid of photodiodes a photodiode converts photons into an electrical charge these electrical charges do not necessarily correspond to color by themselves so each of the photodiodes have a color filter laid on top of them smartphones typically have dedicated image processing hardware this hardware receives the electrical charges generated by those sensors photodiodes and maps them to the right colors these filters arrangement depends on the sensor's manufacturer it is referred to as a bayer filter named after bryce bayer of kodak who first proposed it in 1975. today's modern smartphone manufacturers make color filters together into multi-cell pixel clusters these macro pixels offer enhanced light sensitivity and flexibility the raw output from the stage is referred to as the bayer pattern image or bayer frame and it kind of looks distorted or like a mosaic with gaps between the different colors this is because each photodiode can only correspond to one color so the image processor employs algorithms to help fill in the gaps of the raw bearer image these d mosaicing algorithms as they are called employ a variety of mathematical methods to reconstruct the scene's actual color the simplest algorithm simply fills in the missing rgb values based on the adjacent pixels it works decently but it does have aliasing issues at the corners so more sophisticated algorithms have been developed this methodology might seem a little weird at first glance it implies that two-thirds of the image data from our smartphone cameras are essentially computer generated however it is mimicking what happens within the human vision system the brain similarly compares signals from different cone type photoreceptors to create what we call color vision bayer himself even chosen the initial pattern to include twice as many green photodiode filters has red or blue because that's how it is in the human eye after this the image data goes through a variety of processing and finishing steps first the image is processed for white balance our brain is able to adapt our color vision to different amounts of illumination cameras need to accommodate this and correct the colors in the image to look as if it was lit by a neutral white light without this the finished image looks unnatural for example skin tones might look too warm slash orange or camera flashes will look too cool slash blue this requires an algorithm to estimate the scene's illumination and how the image sensor's color filter will respond to it the result is something called an illumination value and it is applied to the pixel's rgb values then the image processor might manipulate the image colors in a proprietary way this can be based on the user's feedback for instance the user might have selected a more quote unquote vivid image setting or it can be something pre-programmed into the smartphone camera by the vendor to differentiate the look of their photos for instance samsung photos shoot images that look different than iphones the image processor might then also apply another algorithm to reduce noise in the image noise in an image refers to artifacts that are not in the original scene too much noise is distracting but if your camera is too aggressive in the denoising it might end up looking way too smooth and fake like something from metoo since they matter so much to the camera's perceived performance a lot of work has been invested in striking the right denoising balance entire phd theses have been written on denoising algorithms after this the image processor resizes the data adjust rgb values to make it more presentable for the smartphone screen and then save it to a jpeg png or heic file format that's your image pipeline in a nutshell now let's talk a little bit on certain special situations and how much computation goes into porting optical imaging features to the smartphone world zooming in on something is particularly difficult to do with a smartphone with traditional cameras you zoom in on something by physically moving the lens along the optical axis this is not easily possible with your small and skinny smartphone camera there have been some interesting attempts by samsung nokia and asus but the resulting products have a sizable camera bump and despite this bump the zoom performance is rather disappointing the dissatisfying result of optical zoom meant that most smartphones can only offer digital zoom this is mostly done by cropping the original image and upscaling what remains digital zoom values tend to be pretty modest since it further weakens an already iffy image resolution to improve the quality of digitally zoomed images camera makers have employed image algorithms that try to enhance the missing details a simple one would just try to fill in details using nearby pixel data this hasn't quite gotten there yet ultimately the approach most modern manufacturers seem to have gone with is to have multiple rear cameras one with a wide field of view the wide camera and another with a narrow field of view the telephoto these dual aperture zoom cameras as they are called were first introduced in 2014 by an israeli company called core photonics and has been adopted by the leading phone makers apple xiaomi oppo and so on now users can swap between lenses to achieve a zoom a few companies have gone with a different approach called a folded zoom you have a traditional optical access but use a 45 degree mirror to bend light sideways so that you can get your zoom without making the camera thicker a prism helps maintain image stability a few high-end phones by oppo samsung and huawei have adopted this feature which they headline has allowing for 100 times zoom capability it's a little weird but there you go one of the most impressive things about the iphone 13 has been its low light and nighttime performance it is here where the industry has really pushed the boundaries of computational photography the reason why i think they can do this is because the human eye kind of sucks at night when it gets dark humans become colorblind thus as a result camera makers realize that they didn't have to be so faithful to the scene rather they seek to create the most colorful noise-free low-light photo they possibly can high quality low light photography was considered only possible with dslrs they had larger image pixels and adjustable apertures with tripods which allowed for enough light to be captured smartphone camera hardware is not capable of this but smartphone camera makers realize that they can adapt another photography feature to overcome this limitation burst processing this means capturing and merging together many image frames it has its roots in astrophotography has image stacking a technique where multiple night sky exposures are lined up to reduce noise and produce better images in smartphone burst processing the smartphone camera captures and stores frames continuously then when the shutter is pressed it selects a frame close to the moment when you click the button then it merges all the other frames together to create a single high quality image this has its own challenges for instance trying to reliably align the sequence of images doing it wrong can result in weird image distortion issues but when done right it works really well the 2018 google pixel phone first kicked off this low light breakthrough with the night sight feature in creating this the camera designer spent a lot of effort adjusting to jittery hands and moving objects in the scene that could cause image blur but the results were impressive since then other phone makers have used it as a general tool for better photography burst photography has been used for denoising increasing image resolution and high dynamic range compression recent advancements in onboard ai processor technology have allowed camera makers to push imaging performance to new heights here are a few prominent examples the first has to do with white balance correction researchers collected digital photos and then had professional photographers manually white balance them they then fed this data into a machine learning model to create much more effective color constancy algorithms this machine learning method has been particularly helpful in low light conditions it was featured prominently in google's pixels night sight earlier i mentioned how camera makers tried simple algorithms to sharpen the blur from digitally zoomed photos taking it one step further is to train a machine learning model with high and low resolution imagery data so that the model knows how to properly sharpen and enhance blurry edges in a picture it works pretty well nvidia has applied the same concepts to digitally upscale video game image assets to resolutions and frame rates the developers did not originally support they call deep learning super sampling or dlss another place where machine learning has really made an impact in is bokeh with smartphone cameras the whole image is either in focus or not you cannot focus on one thing like a person and have the background blurred computational photography has allowed mobile phones to generate a synthetic bokeh modern cameras can use their second camera or a dedicated depth sensor to figure out how far away the subject is then they can introduce depth blur to simulate the depth of focus effect this is the basis of iphone's portrait mode which at first was just for people but now seems to have extended to other things like animals the phone uses ai to recognize a person or dog and blurs out the rest i've started to notice this bokeh effect leaking into normal camera modes too for instance point the iphone 13 at a cup on the table and it blurs out the background there too really interesting the thing that most surprised me when i started this overview was to find out just how much computer processing and image manipulation goes into creating today's modern digital photos whether we are talking about simple math operations or complex machine learning models the image data that gets saved and uploaded is heavily doctored from what the sensor actually quote-unquote sees this might be a really weird metaphor but it kinda makes me think of a quote in that movie jurassic world in it henry wu played by bd wong says nothing in jurassic world is natural we have always filled gaps in the genome with the dna of other animals and if the genetic code was pure many of them would look quite different but you didn't ask for reality you ask for more teeth every day we take photos of the things around us the image scene itself is a reality but the smartphone camera image of it has over time become increasingly less so but that is what the people want right as long as it looks great who cares if it never looked like the reality we were looking at food for thought man food for thought all right everyone that's it for tonight thanks for watching if you enjoyed the video consider subscribing check out the newsletter or follow the twitter want to send me an email drop me a line at john asinometry.com i love reading your emails introduce yourself suggest a topic or more until next time i'll see you guys later
Info
Channel: Asianometry
Views: 319,649
Rating: undefined out of 5
Keywords:
Id: yY8OFp0-UZw
Channel Id: undefined
Length: 17min 27sec (1047 seconds)
Published: Sun Jan 30 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.