Keynote - Mike Bostock

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
So we are going to go ahead and get started. That is all the stuff I have so say to you guys for now. Without further ado, I'm excited to introduce Mike. Mike, you can come in and get set up. Mike generally needs no introduction, but aisle attempt that anyways. Mike Bostock, the creator of the D3 library that most of us use on an everyday basis. Formerly a graphics editor at The New York Times. Now new and exciting things that all of us are going to see for the first time. So please give him a warm welcome. [ Applause ] >> Thank you very much for that introduction. I suppose like many of the things I do, this talk and the work I present is born out of frustration and my attempt to make that into something productive. If you have gotten frustrated trying to understand why your code works or doesn't work or trying to understand how someone else's code works or trying to understand how my code works, sorry about that. Well, you're not alone. And this talk is for you. So the release of D3†4.0 was focused on making it easier to learn. More consistent and modular. But despite the changes from the API, it wasn't that different from earlier versions. The selections scales and shapes were polished, but mostly unchanged. Doing the same thing. This is for continuity. API changes are disruptive. But I don't want to change everything every year. I want to make a balance between doing information and some improvements and keeping things the same. But after the release of 4.0 I wanted to think a little bit more deeply about how not just to make D3 easier, but how to make visualization easier. Yet in seeking to better a tool for visualization, I remembered something. I remembered that visualization is itself a tool. A means to an end. A means to insight, right? A way to think, to understand, to study, to communicate something about the world. And per Ben Shneiderman, the purpose of visualization is insight, not pictures. Think of codings to construct visualizations, you ignore other challenges, finding relevant data, cleaning it, turning it into efficient structures for analysis, designing that analysis, statistics modeling, simulation, explaining your findings. And I don't mean to† or I don't wish to down play the importance of visualization tools and innovation therein. I have many improvements I plan on making to D3 and excited to see other approaches like VegaLite come out. And but it's important to step back and consider complimentary approaches to related problems. Tasks supporting discovery are often performed by writing code. And coding is famously difficult, right? Even its name suggests impenetrability. It was originally low level binary instructions to be executed by a processer. Code has come a long way, but still hardly humanfriendly. To give a sort of comicallydense example, here is a bash command that I wrote for generating population density from California's census tracks. So it looks like it starts with geo2topo. It doesn't start with this, it's shape 2 JSON, converting it to a new line delimited geoJSON stream. It's not just bash, these are also JavaScript expressions embedded within bash. And then, you know† anyway. I could spend probably a whole talk just going over how this particular slide works. Now, Brett Victor give this is very concise definition of programming, programming is blindly manipulating symbols. And by blindly, he means that we can't see the results of our manipulation. We can edit program, rerun it, diff the output. But programs are complex and dynamic. So this is neither a direct nor an immediate observation of the impact of our change. And by symbols, we don't manipulate the output directorially. We operate in abstractions. They can be difficult to grasp. In Donald Norman's terms, the gulf of execution. And what are the symptoms of inhuman code? The first thing I think of is spaghetti. Code that lacks structure or modularity. Where in order to understand one part of a program you have to understand the entire program. This is frequently caused by shared mutable state. If you have a piece of state modified by multiple parts of a program, it becomes much harder to reason about the value. And indeed, how do we know what a program does? If we can't track its complete state in our heads, then reading the code sin sufficient. We use console.log, debugger, tests. And as you have experienced, these tools are limited. A debugger can only show a few values at a moment in time. To see rich and complex data structures, it's limited. And we have great difficulty understanding what our code does. And sometimes it can feel like a miracle that anything works at all. And despite these challenges, we continue to write code, right? We're still writing code all the time for lots of different applications more than ever before. And so why is that? Right? Are we masochists? Maybe. Unable to change? Probably. Is there no better solution? And in general† that is a very important qualifier† no. Code is often the best tool that we have because it is the most general tool that we have. I don't mean best in some sort of absolute sense, but I do mean best for the right here and the right now and for the person that's doing the work. And that is because code is the most general. It has the most unlimited expressiveness. And alternatives to code, whether that's sort of highlevel, or if that also includes higher level programming interfaces and languages, can do well in specific domains, but these alternatives must sacrifice generality for greater efficiency in their domain. And if we can't constrain the domain, it's unlikely that you'll find a viable replacement for code. There is no blanket replacement. As long as humans are still thinking and communicating primarily in language. And it's hard to constrain the domain of science, right? Science is fundamental. We're studying the world. Trying to extract meaning from empirical to simulate systems. And it must be capable of expressing thought. Just as we don't use phrasal templates and mad libs, we can't use a drop down menu for statistically analysis. We need more than configuration. We need to compose primitives into creations of our own design. And if your goal is to help people gain insight, we must consider the general problem of how people code. Brett Victor had this to say about math. But it applies equally to code. The power to understand and predict the quantities of the world should not be restricted to those with a freakish knack for manipulating abstract symbols. So when I talk about it being hard to code, it's not just a question about making our work flow more convenient or more efficient, it's about empowering people to understand the world. Now, if we can't eliminate coding, can we at least make it easier for our sausage fingers and finitesize brains? And to explore the question, I have been building, prototyping, an integrated discovery environment called D3 Express for explore tour data analysis, algorithms, teaching, and sharing techniques in code and sharing interactive visual explanations. I do want to make visualization easier, but to do that, we need to make coding easier. I cannot pretend to make coding easy. The ideas we wish to express, explore, and explain, may be irreducibly complex. But by reducing the cognitive burden of coding, we can make the analysis of quantitative phenomenon more accessible to a wider audience. The first principle of D3 Express is reactivity. Rather than modifying commands in a shared state, each piece defines how it is calculated, and the run time manages the evaluation. It propagates derived state. If you have written spreadsheet formulas, you have done reactive programming. This is a simple notebook in D3 express just to illustrate reactive programming. It looks a bit like the browser's development console. Except our work is saved automatically so we can revisit it, and it's reactive. So imperative programming, C equals A plus B copies the current value of A plus B into C, right? It's a value assignment. If A or B changes, C is the original value until you execute a new value assignment. But in reactive, C equals A plus B is a variable definition. That means that C is always equal to A plus B, even if A and B change. If I'm defining A and B and updating, the run time is keeping C up to date with all of the active variable definitions. And so reactivity means that as program author we care only about the current state. And it's the run time's responsibility to manage changes in state. That may seem like a small thing when you're just adding a couple numbers, but as your program scales up, this is eliminating a substantial burden. Now, obviously, a discovery environment needs to do more than add a few numbers. So let's try working with data. So I'm going to load D3 and use D3.CSV to load this CSV file here. Both of these operations here requiring the library and downloading the file from GitHub are asynchronously. But in a reactive program, we hardly notice this. And that's because the definitions that depend on these asynchronous values are not evaluates until their inputs are resolved. You canny most asynchronous code as if it was synchronous. And you can see the result from downloading this file. And D3CSV is conservative about types, doesn't infer types. This is a few year's Apple stock price, they're strings. But to start working with the data analysis, we need to convert those into more precise types. Here I'm defining an accessor or row function to D3CSV that I can map to the strings to more specific types, or change the format of the data if I wanted to. So the close field is a number. So as I change that, as I put the plus symbol there, that's the plus operator. The purple strings changed into numbers. It's immediately giving feedback of the changes. And likewise, if I want to make the date into a general date, I have to Parse that. But JavaScript doesn't understand that natively. So I need to write a function. I've called the function before I've defined it. In a reactive program, I can write the code in any order, and as I finish writing program, bring it up to dead. So I called the Parse time function, and now I'm defining that Parse time function using D3 time format. Passing in that value here. Again, see it updating. And as I substitute the fields with the appropriate percent commands, you can see that it updates and looks correct. Now that the data is in the right format, I can start to ask questions. If I want to compute the range of dates in the data set. But I made a mistake here. I forgot to give the data a name. So that's going to give me an error. But I'll just go in there and assign it a name and it reevaluates the earlier command. So it becomes more much resilient to error when it's automatically reevaluating things as they're currently defined, rather than sort of constantly thinking about what state is your program in and how to get it in the right state? You're always operating under current definitions. Okay. Unlike the developer console, cells in D3 express can have elements by returning DOM elements. We can turn the data into a chart. Specify the size, the width, height, margins, the standard B3 convention. And then we can go back and we can take the domains, the extents of our data that we've computed and use those to construct scales. So we have a time scale for X, mapping that data to sort of X position. And similarly for Y, a linear skill, taking the domain of the closed dimension and mapping that to sort of a vertical position. So those have changed to be scales now. And now I'm going to open up and I'm going to create an SVG element. This one, unlike the other ones, has curly braces on it. Sorry. Skip ahead there. So when I open up the SVG I'm going to use curly braces so that I have sort of the ability to write an arbitrary block of code there. Not limited to writing a very short hand expression. So inside of the SVG definition I'm going to use DOM.SVG in order to create an SVG element. That's a convenient wrapper on top of document.create. And you're working with detached DOM nodes.ed and nodes are displayed in the browser. Starts as an empty SVG node, and start to add structure using D3 selection. I can add an axis here, and by default that's at the top because it's rooted at the origin in the top left corner. Then I can specify my translate function. Or, sorry, my transform attribute to move that down. Take that code, copy it. And make the Y axis, which goes there on the left. And so as I am making these changes, I can sort of immediately see what the effect is on the output, right? I'm not having to sort of constantly switch between my editor and then reloading it in the browser. Likewise, when I want to add the actual path there to draw the line, I can add a path element. I'm going to need a function in order to compute the geometry, so I can use D3 line and pass in the right X and Y axis by pulling out the appropriate fields. Looks data Driven, but that's not quite right. They fill black by default in SVG. And we will replace the fill and replace with a blue stroke. That's a basic line chart. But you can see that the program's topology is starting to become more complex. So this is the directed acyclic graph of references in that chart. And that graph was itself made by D3 express using Graphis. And there's the unnamed cell, the SVG output. A few operations of the graph. It's now trivial to take our chart definition and make it responsive. The width, the height and the margin feed into the scales and the SVG definition. They're currently defined as constants, but if we wanted to make this chart responsive to the window size, we could just replace those definitions appropriately and everything else would update. Likewise, if we want to replace the data. If we want to have like a realtime data stream come in there, we're replacing the static definition and the static chart becomes a dynamic chart. I'll show that. But first the difference between the imperative and the reactive style of coding. So this is your typical D3 code that you might see on blocks.org in some of my examples where on page load you're defining the scale. But on page load you don't have the data available, so you can't initialize the data of your X scale. Later, after the data loads, you're defining the scale† or the domain† of your X scale. So if you think about it, you're separating the definition of this object into two places in your code. And you can have sort of arbitrary amounts of unrelated code separating those definitions. If you compare that to the reactive definition, the reactive definition is centralized because we no longer care about the order of execution and the dependencies in our code, right? Those are now managed by the run time. So we can centralize our definition. So reactive programming is not just sort of about making things more convenient or saving you time, it's also about getting a cleaner code structure. And this is particularly useful if you want to be able to reuse these definitions in another program. Right? Because your definitions are now localized and they're easier to copy and paste or to import into other documents. So lastly, you know, online charts, despite the name D3 express, you're not required to use D3. It's just doing DOM, JavaScript. You can use whatever library and format your browse†supports to create your visualizations. This is a similar chart, but now using VegaLite with the data with the nice syntax that provides rather than operating on the low-level object. If I wanted to use Canvas, like in this case, we want to make a globe. I've loaded TopoJSON, some topography of world country boundaries. And now create a new block for the canvas and get the context. If I return that canvas, it's displayed. But I can draw to the canvas to get the display. This is a standard way, might use D3 geo, where I have a Geopath, and pass in the context. And here it's defined as your fixed geo orthographic projection. And I can add another outline which you'll see in a second. But I want to showcase another powerful feature of reactive programming. Which is I can take a variable definition that is static like this fixed orthographic projection, and replace with a dynamic definition, like a rotating projection. And I can do that using standard JavaScript. Which is I take the static definition of this projection as just a geo orthographic, and I can replace it with a generator. And†you're seeing Mercator. And it's updating automatically. Put a star in front, I'm creating a generator that can yield multiple values rather than a constant. If you have used them before, this is relatively new. But it's really cool and a lot of fun to sort of take a static definition and replace it with something dynamic like this. So the way that that's working is that the run time is pulling new values from the generator 60 times a second. So the generator is kind of† it runs once and then it basically gets suspended until the run time is pulling a new value from it. And the way that the code works is that when it's pulling a new value, it's setting in the new rotation angles on that projection and returning it. And, again, because the run time understands sort of the relationships between all of these different variables, it knows to recompute the canvas whenever that projection changes. It's really easy to stake a sort of a static thing and then add a scripted animation on top of it like we have here for this rotating globe. But one of the things that's going on here, you may not have noticed, it's throwing away the canvas and creating a new canvas each time that it's rendering. That can be expensive to throw it away and restart it. It's nice that it works by default. But to optimize it to make it faster, 3D express, you can refer to a previous value of a variable as this. This case, I'm changed the canvas definition to use the existing canvas rather than creating a new one. And you can see it starts to smear because it's just drawing it every time rather than having a blank canvas that it's drawing on to. But, of course, I request then change the code, I can clear the old value, I can fix the sort of context line. And the result is you can have these very efficient animations, right? There's little overhead, you don't have to worry about managing all of that state yourself. Again, to look closely at the code, this is the static definition for it. It's a geo orthographic instance. And the rotating projection. It's a block state in the star in front so that we can yield values. And we're going into a true loop that sets the projection.rotate and yields that value. That's it. We basically didn't have to change anything. So if generators are good for scripted animations, what about interaction? Generators can do this too. And the way that it works, you need a promise that resolves whenever there's new input from the user. I'll illustrate by going back into this sort of rotating projection. If we want to make this into an interactive rotation, first thing we need is a widget, slider for the user to drag. We can do that again by creating the right DOM element here. DOM.range is just, again, like document create element. It's an input element, min value is 180, max 180. You can do it by hand, but nice to have the syntax. We can give it a name and write a generator that can yield the new values whenever you're dragging with it. I can write that by hand, but it's common. So there's a built in generator, generators.input. That's going to do it for you. Listens for inputs on the DOM element and yields the new value. Now when I'm dragging it, I can see the value in notebook. Another name, take that angle and plug into the rotation of our projection. Now when I drag the slider, goes back and forth. You can see that the globe is rotating. Now, again, this is such a common thing to do, there's a shorter syntax for that where you can define both the graphical interface, the DOM element that's being displayed, and the element that's exposed to code. That's what the view of operator is. And it's two definitions in one. They work exactly like the definitions you saw, it's a slightly more concise definition of that. This is the long form definition of a projection. We have the projection, we have the angle coming from the DOM element, the input range, and this is the short hand syntax for the exact same thing, use view of angle. But the cool thing about this is that we now have the ability to sort of create arbitrary graphical interfaces and design the appropriate sort of programming interface, like the values that get exported along with that. You're not limited to just sort of sliders and drop down menus and some fixed palette of user interface. I'm going to create a complex compound input to make a color picker. This is a form here with a table inside of it. And there's an input for the hue, saturation, lightness. This is just using DOM.HTML. Takes a big string and sets the inner HTML of a div and returns that so that I don't have to create all of this stuff in JavaScript. I'm just kind of embedding an HTML fragment in my code. And then likewise, like here, is where I'm sort of defining how this is going to be exposed to my code. I'm setting what the value is on the top-level element. And so that's going to be updated whenever you drag the slider. Whenever there's input event. And it's defined here to create a D3Q instance, and also sort of updating the outputs to go along with the inputs so you can see what the values are in this available below. When I drag the hue slider, both updating the hue angle and emitting a new color displayed in the cell below. And that is used to set the background color of the div so we can see what the color actually looks like. Okay. Now, that's sort of a toy example, again, of sort of defining a custom interface. Where this starts intersecting with visualization starts to make it much more interesting. So this is a histogram showing the return of a few hundred stocks over a five year period. You can see that it's like a bell curve. The mode of that is slightly greater than one. Expecting a positive annual return. But there's also this long tail of stocks that did really badly, and a long tail of stocks that did really well. Now, in another environment, it might be difficult to sort of inspect this directly, right? The visualization can kind of be a dead end, where you can see it, but if you wanted to know what the data points behind these individual bars, you would have to phrase that as a new question in code. And the goal of D3 express is you can sort of quickly augment these visualizations so we can start to manipulate them directly. Getting back to what Brett victor said. We treat it like an input like with the slider and the table. So now when I'm brushing back and forth, just using the standard D3 brush, it's actually exposing its current selection as the array of data that are there. And I'm just displaying that sort of using the default inspector. But it's good enough to let me know sort of what is in the selected range and I can drag that back and forth without doing any sort of work. And just to show there's no real magic going on behind the scenes here, this is the code that adapts your D3 brush, not a special version, but the same D3 brush you're using today, and fitting it into this new framework. Where you're receiving your brush event. You're looking at the brush selection. And then you're pulling out the X and Y values. Basically, like the start and the stop. And then you're filtering your databased on those values. And then you're setting the property† this is what the value property of your node. That's what gets exposed to the code. And then you're just telling the generator, or you're telling D3 express that the value is updated by dispatching an inputted event. Now, another thing that you can do in D3 express that's useful is, by default, these reactions to your code are going to be applied instantaneously. Whenever a variable changes, the run time knows what depends on that and it's going to recompute all the drive variables and update the display instantaneously. Sometimes that's not what you want. You have something that changes and you want to be able to observe what's changed. So we want to use these animated transitions to have consistency. I have rewritten in a way to use the D3 data join. And it's staggered so when the data updates it's going to move the bars into the new positions so you can see how the values change. And likewise, like this data set here, the frequency of English letters, is define sod that you change by value flag, it's going to be sorted either by descending frequency, or lexicographically. So now when I go back up to the top and I change the value of this by using the check box, the code can sort of apply a transition from the old values to the new values. So you can use sort of access to the previous value both to improve performance, and to get better visual output, because in the reactive system you can kind of opt into controlling how these changes get applied. So, again, it's sort of like optin complexity as you want to add richness to your implementation. So what was a pretty whirlwind tour of reactive programming in D3 express. But you saw you to use the inline output and cells to look at current state of the program. But I want to dive into this a little bit more and show you how to use visualization in D3 express to improve our ability to scrutinize a program's behavior. So reactive programming, where you can sort of change the studied and immediately see how it updates is also known as interactive programming. And interactive programming lets us investigate how it works by poking. Change it, delete some code and reorder it and you can see what happens. Let's you get a sense of how that individual code is contributing. You're doing a more direct observation of how that code impacts program. So in this notebook I have the directed graph of data. And it's I can add or remove the charge force. And the charge force is causing the nodes to repel, right? Otherwise if you remove it, they sort of collapse down into the center where the only force that's really applying to them is the link force. Likewise, I can modify parameters of the set, I can set the strength to be 100. They're attracting each other rather than repelling each other. Collapse to the center. And make it negative, and they expand out. Fifty, 100, whatever. It's not reloading the entire page when you're making these changes. It's a reactive sort of topology. And so when I'm changing the definition of these forces, it's not sort of throwing everything away and starting over. It's just operating on the current definition that's running. You're doing live editing of the program. And so that sort of improves the stability and lets you see more easily how these changes are contributing to the program's behavior. So likewise, like if I take out the link force here, they're sort of no longer connected to each other and start spreading out. Or if I take out the centering force, then it can sort of start floating away. Bye. [ Laughter ] Okay. Now, a more explicit approach to studying program behavior† so rather than sort of tinkering with it† is to try to expose its internal state. And I'm going to illustrate this by showing the computing of a round. And we can take a value and turn it into a generator and now yields values in addition to the normal return value. And the idea is that the values that we're yielding as program is running give us an idea of what the code is doing when it's running. The nice thing about both yield and return is you can essentially take sort of arbitrarily complex functions, at least if they're not already generators, even recursive ones, and you have this extra channel now where you can expose the internal state of your program. And that's really useful for visualizing a program's behavior or studying a program's behavior, because it allows you to completely separate your visualization or analysis of the behavior from the code itself. Like if I take some code, as I have done before, and put my visualization code directly within this algorithm, it starts to become a mess. If you're doing canvas draw stuff and using a debugger and switching between the algorithm and the canvas stuff, gets complete chaos. But if you have this approach, you can extract the data while it's running, using generators, or statically building it up as an array of values. Then it's much easier for you to do that analysis. In D3 express, this is how to do it. The simplest way is to call the function like before. And because it is now returning a generator, you automatically get an automation. D3 knows when it's a generator, it's going to pull a new value every animation frame. But also like this, a spread operator, where you're essentially pulling all of the values out of the generator at one go and putting into an array. And that's useful if you want to do static visualization or interactive vision variable where you're scanning in between individual frames. You don't need to study a running sum function. I think we all know how that one works. But I want to use a more realworld example. And this may get a little bit hairy. But I'm going to try it anyway. Looking at the circle packing in D3. This is the flare class hierarchy. And hierarchical circle mapping is like tree maps, but you have this extra wasted space because you're nesting circles rather than squares. That extra space is not wasted, it indicates the hierarchical structure that's not always obvious with the tree maps. In order to produce these diagrams, you need the individual circles. The set of siblings in part of your tree. And so this is sort of a little example of how that works. Right? You have a set of circles that you want to pack in order, one at a time, into a small a space as possible without overlap. Sort of like penguins huddling in Antarctica. Your job is to place one of these circles at a time until you have placed all of the circles. And you want the circles to be packed as tightly as possible, each new circle that you place should be tangent to at least one circle that you have already placed. Two of the circles. But if you pick an existing circle at random as you're replacing the new circle as your tangent circle, you're going to waste time putting that new circle into the middle of the pack where it's going overlap with the other circles you have put down. Ideally when you're considering the tangent circles, you should only be considering the circles on the outside of the pack. But the problem is, how do you determine which circles are on the outside. So the algorithm used by D3 and other implementations with this lay out maintains a front chain. And that's the red line. And the front chain represents the outer most circles. So when you're placing a new circle, it picks the circle on the front chain that is closest to the origin. And the new circle is placed tangent to this circle and its adjacent neighbor. And if there's no overlap with other circles on the chain, it can move on to the next circle. But if there is overlap like in this case here, where the big circle overlaps with the other circles on the front chain, then it needs to cut the front chain so it can choose a different pair of tangent circles and effectively move that circle to the outside. If you look closely at the animation, you can see the moments where it's cutting the front chain as the larger circles get squeezed out of the pack and pushes down. I find this mesmerizing to look at. But more than being sort of eye candy, this animation and notebook was extremely helpful for me in fixing a longstanding bug in D3's implementation. Where there was a little bit of vague wording in the original paper, and it wasn't obvious in some situations of where to cut. Where you have a circular structure. In some cases, it's not obvious where in the front chain it needs to be cut in order to place the new circle. And having the ability to sort of inspect program to see what it's doing as it goes along, instead of the output and it being wrong. It was easier to find the bug and change the algorithm and see how it affected the conditions without changing the output. This is one part of circle packing. The other is once you have laid out your siblings, you need to compute the enclosing circle for that pack so you can then move on to other parts of the hierarchy. The conventional way of doing this is you sort of scan the front chain and pick the circle farthest from the origin. That works well because the parks are roughly circular. But sometimes the packs can be slightly shifted off to the side. That doesn't end up being an exact solution. And I learned that there's this algorithm called Weltzel's algorithm that gives the optimal solution and runs in linear time. There's no reason to not do that. But if I can do it once and improve, even with a slight improve, that's awesome. And it's fun to understand how these things work. The algorithm is incremental. It's working on one circle at a time and works in random order. And you can see just in this animation that it sort of very quickly, because of that approach, converges on to roughly the right and closing circle. But there is a chance as you get the circles on the outside that it has to expand. So how does this algorithm work? One thing I should say is this is a sort of a slightly harder problem that I'm showing than what happens inside of the circle packing layout. Because in circle packing it already has the front chain, so only needs to compute the enclosing circle of the front chain. But this is the case where you have an arbitrary set of circles and don't know the front chain ahead of time. How does this algorithm work? Let's assume we already have an enclosing circle for some set of circles. For circles zero through I minus one. And we assume we know the enclosing circle, but this is sort of how math works, induction works, how we can start to build an algorithm as a starting point. If we assume we have an enclosing circle for some set of circles, and all we want to do is incorporate the next circle into our enclosing circle, well, if that new circle, which is the black one here, is already inside our enclosing circle, then we don't need to do anything. The circle is fine and we can move on to the next one. But If the circle we're trying to add is outside, it's not contained by the enclosing circle, then we need to compute what the new enclosing circle is. But we can actually make an observation about the new circle. And that is if this circle is outside the enclosing circle, we know that it's the only circle that's outside the current enclosing circle. And that means that the new enclosing circle must be tangent to the old circle that we're replacing. So that looks like this. But the problem is we don't know what the other two tangent circles are. They might not be the same as the previous enclosing circle. But if we know what one tangent circle is to the enclosing circle, we can look at recursively. Every time we have one outside, we recurse to find the next tangent circles. And some other boundary conditions like what the enclosing circle is when you have one, two, three circles. It's called Apollonius' problems with this is geometry. This is already enough. You can get an idea how the algorithm works and why it's able to terminate. Now that we understand the recursive structure, we can make a visualization that shows a more complete view of how the algorithm works. From left to right, the four possible depths of the stack. Like it can't reoccur more than three times because you can't have more than three tangent circles or you contain all the circles. Again, that's geometry. Before it was just the circles on the left. But you're seeing as it has one of these circles that's outside of the red circle, it's going to add the new tangent circle and start descending into the recursive approach. But in addition to showing you how this algorithm works, one of the nice things is you can get a better sense of how much time the algorithm spends in different states of the program. And like in this case, you know, you see the enclosing circle gets bigger very quickly, but you can see whenever it finds sort of a circle that is outside and needs to recurse, it needs to revisit all of the previous circles to make sure the new enclosing circle actually contains everything. I'm not going to prove that it's linear time. That's too much work. But anyway, you get kind of a sense here. Anyway. All right. One way to write less code is to reduce it. And the 440,000 published sort of attest to this approach. But libraries are an example of active reusability. You must design a library to be reusable. This is a substantial burden. It's hard to define reusable abstractions. Ask any monitor. And oneoff tide, like the D3 examples, it much easier. You're only worried about the task at hand. You don't have to generalize into an abstract class of tasks. But with D3 express, I'm trying to explore if there's an intermediated with better passive structure. You can use the documents to more easily repurpose code. And one part is you can treat any document like a lightweight library. Had this document, defined a color interpolation, and a pretty gradient. And now import that into this document and I'm going to call it. So if I had another color function or utility, I don't have to create a package on NPM or by the GitHub repo or whatever. Just pull it into the code. And it's pulling in the dependencies automatically. The original definition was D3† a plugin. And I don't have to load that separately. But likewise, even though I'm loading that definition of D3 in the remote document, it's not going to conflict with the local definition of DP. I'm pulling in the functionality. I'm only pulling in the symbols that I explicitly reference in my import statement. You can do cooler things, you can rewire the definitions to inject your local definitions into the remote definition. Excuse me. So this is a case where I have a data set that is going to be streaming over a Web socket. I'm not going to explain how all this code works. I think the idea is you would have sort of an API for connecting to a socket and getting a realtime data stream. It just has an array of values which you keep the last 60 seconds of. The result is this is a generator. It's going to emit a new array of object with time and value. So the question is, can I visualize this using an existing line chart? We had the line chart with Apple stock price. The same structure. This is a slightly different definition. It has a time and value. So here I'm embedding the chart. You can see it's just that same sort of basic chart. But if I add this width clause here, I'm injecting my definition of data into the chart, right? That static chart becomes a realtime chart. And I didn't have to change any other aspect of the code. Because that code was already defined to be reactive, right? Setting the domains of X and Y. But the cool thing is I don't have to stop there. I can also augment sort of other aspects of that chart definition. Because they're all part of this topology which is exposed and then I can override it. So if I don't like the fact that the Y scale is dynamically adjusting based on the window, and I want to have the fixed window because I know the expected values from my streaming data set†we can do that. And other definitions like the width and height and margin. And inject the Y definition. That changes so it's now fixed scale. And likewise, I can do the same thing with the X scale. So rather than sort of showing you the chunks as it's updating. I think it updates four or five times a second here. I can have a smoothly sliding X scale that updates 60 times a second and crops the data so that it's slightly outside the window. And it doesn't care whether the X scale is a constant or a generator. I can just plug those things in. All right. So† All right. So notebooks in D3 express. They run in the browser, right? Not in the desktop and not in the cloud. There's a server to save your edits. But all the rendering and the computation happens locally in the client. What does it mean to have a Webfirst discovery environment? In my view a Webfirst discovery environment embraces Web standards. Vanilla JavaScript and the DOM, today's open source, snippets on the Web or libraries you're getting from NPM. Limiting the specialized imagine you need to be productive. There's syntax for reactivity. But I have tried to keep that as small and familiar as possible. So using generators to define sort of these dynamic values. These are all of the forms of variable definitions. And they're just, you know, expression, block statement, sort of a generator block statement, and your standard sort of function definition. Probably more important, though, is that your code can now run everywhere, right? So if it can run in your browser. If it's using Web standards, it can run in anybody else's browser. There's nothing to install. And that means it becomes much easier for others to repeat and validate your analysis, right? And by extension, your code for exploration can gracefully transition into code for explanation. You don't have to start over from scratch if you want to communicate your insights. It's great and I want to commend journalists and scientists for increasingly being open and sharing that their data and code. But putting code up on GitHub is not necessarily enough to make it usable. It's a lot of work for those who want to use it to make the environment. Need the right software installed, and familiarity with the tools that you're using. But if the code is running in the browser, again, nothing to install and sort of works by default. So, again, maybe you should have gotten Brett to give this talk instead. Going to end on another Brett Victor quote from the explorable explanations to explain sort of the implications of this approach. So an active reader asks questions, considers alternatives, questions assumptions and questions the trustworthiness of the author. An active reader tries to generalize specific examples and devise specific examples for generalities. Doesn't sponge up information, but uses the argument as a spring board for critical thought and deep understanding. Imagine if our algorithms are not just in prose and PDFs, but shared in live code and shared with interactive visual diagrams. It's easier for the reader to look at how they work and question and tinker with it and make modifications with it. I want to end on a slight disappointment. This is a lot of work. And it's not ready for you to use yet. But I hope it's going to be ready very soon. You can actually sign up for like an early access thing when it's available on D3. express. That's the URL. But if you are interested in this stuff, please come talk to me about it. If you want to help me build this so it's available sooner rather than later, please get in touch. And thank you. [ Applause ]
Info
Channel: BocoupLLC
Views: 8,158
Rating: undefined out of 5
Keywords: Open Web, JavaScript, Programming, Open Source, Bocoup
Id: lNbqfQlGkzc
Channel Id: undefined
Length: 51min 40sec (3100 seconds)
Published: Mon May 15 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.