Image processing on Raspberry Pi 5: our new hardware image signal processor

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] I guess one of the biggest differences one of the biggest changes between Raspberry Pi 4 and its predecessors and Raspberry Pi 5 is Imaging is how we deal with data that comes from the camera now we've been doing cameras Raspberry p since about 2013 I think we had a 5 megapixel omnivision um product which was actually the the very first official accessory for the Raspberry Pi um back in those days how did we deal with data that came in from a camera uh so well back then um so these are it's important to understand that these are what's called they're kind of called raw sensors uh so you get sort of a very basic form of pixel data from it it's not sort of beautiful nice images that you can just sort of throw straight on the screen um so they need a lot of processing to make them look nice um so we get these these very raw numbers they come off the image sensor um and these are red green and blue values for every pixel not for every pixel no no um in fact every pixel actually only has uh a single color in fact and then part of the processing that we have to do is to um well we would say interpretate the missing colors you could say invent perhaps that will make up I don't know the missing colors so that's one of the things that happens but anyway so you get these pixels coming off the sensor they come down on the Raspberry Pi people will know the the flat ribbon cables that we have so they'll come down the ribbon cable um and uh there's a a hardware block on the P that receives them that's called the the receiver um and which just dumps them into memory is all that happens they just get stumped straight into memory and then everything else happens uh actually on the on the main uh chip itself um and there's this this this piece of Hardware which is called an image signal processor um often abbreviated as an ISP so nothing to do with internet service providers I'm I'm I'm super excited to learn it is an image sensor pipeline which is what I always kind of thought well it used to be it's changed names I think most people yeah it's I mean it's had various versions of the abbreviation but I think most people now call it an image signal processor I would say um so there's one of those on the actually on the on the raspby pi chip and this reads these bad um images so so these these cameras you know they're quite they're quite cost engineered yes aren't they so both the Optics and the sensor itself yes that's right yeah so then all the yeah so the as as we've said the the the pixels that we have in memory at this point are quite they're quite ugly things I mean they're dark they're gloomy they're they're all mostly green it's just a bit of a mess really so you can't you can't use those for anything really and then there's this thing called the image signal processor or the ISP that reads everything back from memory where it's been stored when it arrived from the sensor um it processes them it does lots of stuff to them uh we can talk about more about that in a minute I guess hases lots of stuff to them and makes a nice picture and then writes that back out to memory that's basically what happens uh yeah so it's just dealing with processing the pixels into a nice output image there's no there's no image encoding so it's got nothing to do with image encoders or jpegs or anything like that that's this later this is RGB data coming in from the um uh from the from the camera well it's called Bay data Bayer data what's Bayer data it's one red one blue and two green pixels for each little by two so when you say each pixel only has a single color that's that's what you mean it's it's either red red or green or or blue and there are twice as many greens as there are as there are the other the other colors um so um this data is coming in as Baya data which is this kind of sparse RGB sparse RGB data um it's going through the pipeline it comes out as RGB data or does it come out as something else RGB or yuv right okay so yuv V is another color space it's this kind of intensity rather than just saying this much red this much green this much blue it's kind of this much brightness and this much color color difference yeah color differ yeah it's a different color encoding I think we would say it's it's optimized for video encoding right right and that would be the that's the then the encoding of something like yeah video encode or jpeg or jpeg uses that okay if you're GNA if you're going to take your data and store it encode it and store it it's more convenient to have it as why you and yuv allows you to compress the well not compress but remove some of the color samples right because and why is it safe to do that because your eyes are more sensitive to luminance changes than color changes this is robs and Cones this RS and Cones next I mean you could argue it's not terribly safe to do it but we do it anyway I mean it does everybody do it does create problems I mean that's one of the reasons why uh you know when you see people with highly stripy shirts on TV sometimes you get some funny colors appearing and that's because they actually throw away the color data right that's that's why it happens so um so that's the classic that's the classic world where you have 2835 on a Raspberry Pi 1 2711 on a Raspberry Pi 4 and all of that's integrated into into Bond chip so you have the receiver you have the the block that writes data from the receiver to memory that's called uh unicam in the old world right um and then you have the ISP the image signal processor which is a memory to memory uh a memory memory to memory B Master which is pulling those those Bayer images in doing stuff and then writing either um non-space RGB or or yuv or or nonp yv or subsampled yuv back to uh back back to back to memory so we should probably talk a little bit about the stuff what did stuff consist of classically obviously one of them would be DB right so yeah I mean statistics Black Level correction Deb as you said um um I we generally start off trying to fix defects in the image by trying to reduce the noise and to spot defective pixels although actually sensors usually do quite a good job of hiding defective pixels themselves because the sensor manufacturer knows what kind of defects are likely to be present right and so you may just in a 12 megapixel uh sensor you may just have some pixels which are dead yes and you just replace those with a neighboring signal right um so so really the first thing we try to do is to get rid of Noise by smoothing where the image looks like it ought to be smooth right okay so this is and what sort of noise is this this is this is thermal noise or does there's thermal noise and quantization noise in the electronics there's also shot noise due to the fact that the photons arrive at random times right so you're so you're trying to so you're trying to kind of build some model of what you think is is of of of what you what you think the signal should look like and then kind of kind of push the signal towards yeah what it should be like so there are there are different ways of doing it but the general approach is where the signal looks like it's smooth up to a certain degree of noise you make it more smooth right where it looks where it looks like there are sharp edges you kind of preserve it right right and that's that's spatial so that's spatial D Noise Okay so you've done some spatial D noise then what you got Distortion correction there some yeah I mean I was I was going to take a step back actually and just just say there kind of to my mind there are two kinds of things that we tend to do there's a sort of large scale processing that we do to images so this is about getting the colors right getting the kind of uh the um the the gam curs right so the thing basically looks like a nice image um and you can kind of think about a lot of that without worrying about the actual pixel level detail and all that kind of stuff um but then that's the other thing you have to do obviously you have to get the pixel level stuff sorted out as well so this is always very sort of fine grained it's to do with the den noising that we've talked about it's to do with the sharpening it's to do with de bearing as well all those kinds of things um so there's very much these two things going on and you have a pipeline of these you have a pipeline of these stages and you're passing different right operations right and you're passing pixels down this and and and it gradually becomes you start off with this bare you know uh extremely incorrect data and at the end of it at the end of it you hope to have you hope to have something which I guess is is true for some Val for some for some value of true is the closest you can get to to to to ground the ground truth the actual state of the world that generated the uh yeah yeah I mean yes I mean I got to say that a lot of the sort of Corrections that we do so Nick talked about the defective pixel correction the D noising all that kind of stuff you try and do it early because when the the the the pixel numbers that come off the sensor I mean you have some expectation as to what statistical distributions they follow and this kind of stuff you probably calibrated and measured them so you can kind of do stuff with them as soon as you started touching the pixels and doing stuff with them all that starts to go away so you kind of can't do that anymore you can't you you can't do noises effectively later in the pipeline because you you've done all kinds of stuff to the numbers so you kind of got no idea really what you're looking at anymore so a lot of that sort of stuff happens early so there a natural order yeah there is a fairly natural order for a lot of it you try and to do a lot of that sort of stuff early you fix up the problems and then you've got to sort out the colors the um you know the transfer function um sharpen it up a bit all that kind of stuff so the big sort of milestone in the pipeline is demosaic where you you're changing before that you have the bay order data and after that you have RGB right and after that you also have to gamma which then removes more linearity from the pixels right um so that's your so that's your your kind of classic um got a classic ISP pipeline um uh as as as a mem to mem Master you then have you then have have things that kind of algorithms that kind of run around the outside of this the sort of three three a stuff is that right so these are your auto so algorithms Auto white balance Auto exposure gain and autofocus but these days we do another one we call it Foy don't we yeah yes I mean the way to think of it really is that every every block in the hardware we have an algorithm associated with it pretty much I mean the that controls it some algorithms control more than one block um so a typical one would be we talked about the auto exposure and uh gain algorithm so that would obviously that would control well that controls the sensor but it also controls the the gain Block in the uh in the ISP and then other blocks like there's a block that applies color gains that again has an algorithm that controls it happens to be called the auto white balance algorithm so every every one of these algorithms and your auto white balance algorithm is Tri so you're you're automatic gain control automatic exposure algorithm is trying to avoid the image being blown out uh if you if you walk from inside to outside something's going to happen that causes this outer set of algorithms to say ah too much too much light dial down some gains and if you walk back inside you go well this image looks almost black let St the gains up a little bit um similarly with the color I mean I saw an amazing the the New York thing I saw a really wonderful thing recently where somebody uh you know they had the orange sky or that dust that had come uh and there somebody illustrating that smartphone pictures of it weren't really capturing how incredibly orange it was and they took one of the color charts outside and then calibrated the because the auto white balance was trying to make the was saying well this image is too Orange let's make it less orange and then when you take a color chart out and say that square there is white drive drive your auto white balance from that square suddenly you see how incredibly orange the place yeah I mean that kind of is actually classic because I alsoo white balance is kind of an impossible algorithm because when you're looking at a you know a white wall under a kind of yellowish light you can't tell that it's not a yellow wall under a white light you have no you know so it's a sort of has a it has like statistical properties you have to look at the sort of statistics and likelihood of the things that you're seeing in determine what you think is most plausible and so the upshot of that is of course that white balance hyrams tend to be calibrated against typical scenes so that you assume kind of normal-ish behavior of things and so when you get sort of exceptional circumstances they get caught out um so that's exactly it and the trick is to You Know cover as many of the outlying circumstances as you can without messing up the kind of common ones as well so and then the other classic algorithm is is also focused where you have you have one of several methods and know with camera module 3 we've got several methods including the phase the phase detect method for going hang on the yeah this image is yeah this this image is out to focus Drive the focus actuator backwards and forwards yeah so the camera module 3 uses phase detect autofocus I don't know if Nick can tell us about that but it's it's a particularly cool method actually of do do aut to focus the nice thing about it is you can't as well as telling you whether it's in Focus or not it tells you which direction you have to move the lens yeah to get it in Focus because there are simpler contrast detect autof focuses aren't there which try to kind of Maxim when it's out of focus it's blurry so if you make the image have more high frequency you have to search for the lens position that makes it least blurry yeah yeah yeah and that's very difficult because uh you have no you have no sort of have no definite knowledge that it's now out of focus there's something wrong all you can do in in that kind of world is just well the IM has vaguely changed a bit maybe I need to do another autofocus kind of search and you don't know whether to go forwards or backwards so you get this kind of hunting behavior and it's very hard to calibrate it to hunt for Focus only when it really needs to rather it's it's it's basically it's impossible it's a big the the phas is but the phas nails all that for you it's it's a you know that's why it's very common there on all the on all the cameras on all so you've got IP in the middle you got you've got receiver you got unicam receiver ISP algorithms that are inside the ISP but we talk about tuning sometimes what is what is tuning it's it's all the characterization of a particular camera so it's it's everything from the uh you calibrate what kind of noise you get from a sensor so that the the hardware block can be told what kind of levels of noise it has to sort of try and flatten um you calibrate the color response of the camera so that it knows how to turn the camera's version of colors into like proper true colors that you would recognize um what else lens shading yes shading correction yeah yeah so vignetting and color shading very common across lenses all that uh gamma how we shift the luminance on in the image uh to make it look nicer so you end up so you so the flow there is you you you get a a sensor or some set of sensors some some some representative sample of sensors from the manufacturer and then you take pictures of some some no and stuff and you derive from that a big table of numbers which constitutes the tune yes and that's changed a little bit hasn't it over the course between the the old world there's there's not just an old hard an old Hardware World in an old a new hardware world there's an old software world and a new software world right so the LI Camera World tuning is a little bit different it's a lot simpler yeah a lot less numbers to deal with yeah so there's a lot of configurability in the classic in the classic ISP and you found ways to really kind of slim that down to just the meaningful there's VAR things we've done I mean in the old world again you know again we we had had lots of different customers and they were tended to be often quite big customers and when they wanted a particular feature you would give them the particular feature they wanted but it sort of meant we had piles and piles of features for every customer we'd ever worked with they wanted particular things and so this thing filled up with numbers like uh like like crazy really made it very hard to create I mean this is one of the reasons why the um you know the HQ cam took so long to roll out because you know my goodness someone had to make this tuning file up and it was just a nightmare that was that was you that was not so that was really hard so so we were very keen yeah yeah so we were very keen to drop all that stuff that we thought just wasn't really useful uh generally speaking most of those numbers you wouldn't touch yeah and you know to some extent in the new sort of open world you know where people want these kinds of particular behaviors they can put them in the codee's there they can do it uh so we didn't feel we had to put everything in off the bat um but then there was also a real wish to make the whole thing just a dance side simpler so that we could turn a new camera around in a you know a few days or something rather than like months every time so you had these these big tuning campaigns for OB 5647 which was the the camera module one IMAX 29 uh for camera module 2 uh 477 for for um HQ Cam and then the LI Camera World appears and subsequent cameras so you then went back and retuned those th tune those old cameras in the new world yeah so you can still use a camera module one in the live camera world that's right uh but subsequent cameras are only tuned and we think it looks better it mostly looks better and we have a much better handle on what's actually going on inside it there it's it's I've seen I've seen some really nice camera module one pictures recently um that that are yeah materially better than than than I than I would historically have thought camera module one could do yeah I think basically I think we do do better even on the existing the old platform actually the tunings that we have are better but just in part they're less complicated less of a nightmare to make it means can spend time just getting the basic things in order and it's a much better place to be so that's the classic world yeah that's the classic Hardware world what's different in Raspberry Pi five I guess the main difference is it's been split between two chips so the CSI receiver and part of the ISP are now in the rp1 Southbridge and then most of the ISP is in the main processor 2712 so you have mppi and we have two for Lane so we've grown some yes we have two four lane M ports that can be either CSI 2 or DSI or one of each and we have a little bit of a challenge finding four lane there AR a huge number of four lane sensors obviously but all our standard cameras are still two laner but the door is open to start doing for a and there are some super high resolution for and we'd love to do it that would give us much higher resolution and higher frame rates yeah it' be great to do I mean I mean h cam in theory will do four Lan there's obviously lot we need a board that has four lanes we'd need a driver that supports for all these things that we don't have now but the door is open to doing these things and and then we could just drive one of these things to 12 megap or two of the two 12 megapixels 30 frames a second you know so you've got so you have two camera connectors each of them is twice as wide as as before uh but the and as you said these are not connected to the main chip they're on rp1 yes and then collocated on rp1 with each of these mppi interfaces is what we call the ISP front end which doesn't it doesn't do a great deal one of the significant things it does is get statistics early on the raw image coming in so we don't have to rely on feedback from the previous image we've got more data up front does this does this make it does this make some of these 3A 4A algorithms easier is that what is that yes I mean we can converge a bit faster it removes a bit of the latency so you're you're doing some early processing a little bit of defective pixel defec this defective pixel correction I don't think we're using it at the moment as I say the sensor manufacturers tend to do a good job but it's there if we need it that's really there to protect the statistics and there's also optional down sampling in the front end to reduce the memory bandw I think it's probably more to protect the Dan sampling actually isn't there I mean uh once you start messing with the image then the pixels become correlated and it's harder to do defective Pi operation and one of the things we do in the front end is we also compress and we have compression scheme and that's a lossy slightly lossy slightly lossy it's 10 or 12 bits down to eight bits moment could it's been it's been carefully designed for our use case to to try and make it so you can't you can't see the because we have a relatively well we have we have probably what in my mind is a lot of bandwidth over PCI Express we have two gigabyt a second of bandwidth over PCI Express but this is in a platform that has 16 or 17 gabes a second but it's but but it's also on the other end so when you're in from the backend ISB you're reading in less yeah I mean it's just Fone listening it's put out it's optional the compression so you can turn it off is it turn on by default we do mostly turn it on most but it's optional you can you can turn it off and not have it but if you want to be so in the future where we potentially do have four lane you know a pair of four lane high resolution cameras connected of course yeah yes then then we' appreciate that'll be that'll very helpful right so this is so that stuff is in rp1 rp1 statistics uh compression and a bit of downscaling and defective pixel correction MH and then 2712 we've got most of the ISP which again is go is a memory to memory master so it's now reading the the potentially compressed somewhat touched up data that's coming over PCI Express from rp1 doing stuff to it yes and getting nice pictures out of the back what stuff how the stuff different so a lot of the stuff is the overall shape of the stuff is quite similar right A lot of it is very similar I mean what I was I mean one big difference to start with it's got much higher throughput that's that's a huge change I mean it's clocked higher and it runs at two pixels per clock right um so I mean this I think it's clocked at something like 800 mehz so you can get 1.6 billion pixels second I mean I mean I mean this thing will do dual 4K p60 in its in its sleep virtually I it really is very capable um gone from sort of single pixel per clock 500 MHz to two pixel per clock 800 it's over a 3X up yes that's right yes I mean in reality the old the old ISP you well you got two or 300 megapixels a second through with a bit of luck would you say yeah you you you take you remove about you have an overhead of about 30 40% because of the overlaps on the tiles um and similar here as well but not not as much up actually on in the new in the new world but yeah so you'd get about 250 megap now it's it's vastly greater so three point is a really big difference yeah yeah and in terms of in terms of algorithms it's some of its better versions of the same thing right better debing better special noise yeah I would say the specialty noise is certainly better it's got a wider the filters have kind of wider support so you get better performance that way what's so that sounds like the big so we talked about when we talked about D noise we talked about spatial D noise about looking at a pixel and looking at its neighbors and going yeah that doesn't look right that's a smooth there's there's some smooth change that I can see in my across my filter support I can see some smooth change and that and this pixel is sticking out above that smooth change it's probably not a real Edge it's probably shot noise or or thermal noise so that's spatial D noise but now we have temporal D noise right so there's fully Hardware temporal noise so what this does is it Compares every frame that arrives with kind of the previous frame effectively and if there where they look like it's they're the same it's it averages them together in some way but this is a long-term process if you like so it's all the while it's kind of averaging frames to create what it what we call the long-term average frame so this is kind of like the denoised the temporally denoised version of the frame and then every time a new one comes in We compare it with that and we kind of merge them together and that's the new long-term average frame and that's what then goes forward um down the rest of the pipeline so sort of looking behind me at this this blank space behind me if in a video there's kind of there's variation from frame significant variation from frame to frame in that color it's obviously wrong right because it's it's a static thing it's not changing and that's what temporal D noise tries to capture it will really it will really clean that sort of stuff up so you get much lower noise videos you get to spend much more of your uh your your bit rate for your encoders you actually spend encoding the detail and not the noise uh so you get much better video quality as a result so temporal noise is a is a is a really good feature actually to have I things get much cleaner and so that's nice it makes a tremendous difference see so that's is that the big is that the big ticket in terms of things which are qualitatively different about this ISP Yeah well yeah that's one of the big things one of things HDR is another one HDR HDR is the thing we've been looking at yeah so um again there's various ways you can drive that you can sorry it was HDR so high dynamic range Imaging so normally what this means is you have to have several images and you have to combine them together in some way exposure bracketing yeah it's a bit like exposure bracketing so you get both the dark areas you can bring them up and also the highlights you can avoid them blowing out and try bring them down so it's all the classic room with a window yeah badly lit room with a window yeah you want to somehow have simultaneously have not much gain so you can see the tree out the window and the window do doesn't just blow out to white but also quite a lot of gain inside the room so you can actually see things the room and our camera module 3 does this on chip but now we're going Tob can do it we're going to try and get the ISB to do it for the other sensors as well right or any sensor that's kind fun yeah so it's that is quite fun as well um and there's yeah and there's various different ways we can we can drive the the HDR processing it so is this is this a commitment to bring HDR to hqcam then or well I mean of of a well at least for static at least for relatively static scen well yeah so okay it's all complicated there are two kinds of HDR that we're doing at the moment one of which is where we uh we basically run the camera underexposed and combine all the frames that come in we basically kind of add them up and average them so it's it's kind of like we had a both short exposures and long exposures and we kind of mung all those together and then that goes forward and goes through the toone mapping process so actually that's quite a nice form of HDR it's very good for video actually you don't suffer motion artifact we don't suffer motion artifacts because it basically relies on temporal denoise to to sort of basically add things up is kind of what it does um so how does it get rid of motion then if well Global motion or local motion yeah so what happens is that because temporal noise averages where things are the same and where things aren't the same it just takes the most recent frame that's just arri so so kind of your HDR for moving objects have but what happens is you get slight kind of noise Halos if things are moving so you you still get a re a perfectly reasonable picture and it's all good but where it moved you get a slight Halo where it's a bit noisier if you like but it's fairly marginal effect um but the rest of it all you know you can run this whole thing at 60 frames a second or whatever and it's quite nice um then there's another form of HDR that we're doing this only really works for still scenes at the moment there's kind of you know a lot of this stuff is I would say it's a bit you know it's a bit of a first cut some of this stuff there's a huge amount of new stuff obviously regardless of the fact that you know the whole thing is new but there's lots of new stuff we developed for it as well and so there's this other form of HD where you actually get long and short frames and it actually combines those directly together and that I would say is more for still images at the moment um so there's a slight work in progress on that one it's not so good on moving images because the artifacts aren't compensated so well on that but yeah know that's work in progress so you know the next version whenever that's going to be will be perfect but this is a so this is but lots of this stuff is there in the system already and is latent and so they we're kind of opening up and this happens I think this happens with every major revision of the platform you know we talk about kind of like the major revisions of the platform are like Steps like treads on a staircase and then it's like in between the software work you kind of pour sand on the staircase and you get sort of a gradual Improvement driven by software and then this you know this the really what's happened is this it's it's increased the ceiling on what you might be able to do in software yes and I think as time goes on some of these things we'll be able to do a bit better particularly if not always some you know if you're doing 4K video 4K p60 video it's kind of hard to do very much in software just because there's so many pixels flying around but if if you're doing still image capture actually there's quite a lot of scope for adding some software processing and that kind of thing to improve some of these areas that we'll then improve in Hardware again later on another revision so um so there's a whole kind of path of stuff happening there you know cuz the arm cores on this thing are you know power they're stonkingly good they really are I just ran the numbers last night actually and I think we have something like if you compare the GPU on the platform with the CPUs on the platform you have about 50% the total floating point throughput of the arms is about 76 uh g- flops and the total um throughput of the uh of the GPU is about 50 is is about 50 g- flops so there really is all mid-40s even maybe so there really is a huge amount of performance available to you that on the yeah yeah I mean the thing I always come to is the most exciting thing about this platform I mean all these things that we've got there whether it's temporal denoise and HDR and better this better that uh more output formats you know just more of everything that's better on everything this is all great but actually the most exciting thing is that it's our platform and we get to develop it going forwards I mean that's actually the most that's the most exciting thing because up till now you know we've had a platform it has never changed nothing has ever changed about it no bug has ever been fix fixed no material performance Improvement other than a bit of clock uplift I suppose but there's been there's been nothing ever yeah right and so this this this ISP this new ISP was developed at Raspberry Pi and then put into the into the CIP it is ours you know we we we we we're already making plans for the next version of this thing you know some of the cool features that that we you know haven't squeezed into this one you know they're going to go into whenever the next one is and imaging imaging had become an outlaw in terms of the hardware imaging had be an outlier that it was the only thing that had not moved forwards uh once we you know once we upgraded to in the last generation to video call 6 for um for for GP for for 3D and the video call 5 for the um scan out for for hvs it had become the only surviving block Maybe video en code as well about the only surviving Block in the platform that hadn't seen some attention since the days of 27 278 2835 yeah yeah so this is it is I mean it's hard to exag what a big deal this is it's it's not a step change you know it's it's a whole new building it's just everything is brand new but now the step changes will go forward on every you know every time we can find an occasion to do a new version you know we'll you know we'll we'll come to you and say we want to do this yeah give me a new chip yeah we got a new chip please have RTL get me a new chip we've got new fabulous stuff we will yeah so that that's really exciting yeah excellent looking forward to having a play with it thank you thank you very much

Info

Channel: Raspberry Pi

Views: 37,212

Rating: undefined out of 5

Keywords: Raspberry Pi

Id: vWBNKMf6eQI

Channel Id: undefined

Length: 31min 32sec (1892 seconds)

Published: Thu Oct 19 2023