So we are going to go ahead and get started. That is all the stuff I have so say to you
guys for now. Without further ado, I'm excited to introduce
Mike. Mike, you can come in and get set up. Mike generally needs no introduction, but
aisle attempt that anyways. Mike Bostock, the creator of the D3 library
that most of us use on an everyday basis. Formerly a graphics editor at The New York
Times. Now new and exciting things that all of us
are going to see for the first time. So please give him a warm welcome. [ Applause ]
>> Thank you very much for that introduction. I suppose like many of the things I do, this
talk and the work I present is born out of frustration and my attempt to make that into
something productive. If you have gotten frustrated trying to understand
why your code works or doesn't work or trying to understand how someone else's code works
or trying to understand how my code works, sorry about that. Well, you're not alone. And this talk is for you. So the release of D3†4.0 was focused on
making it easier to learn. More consistent and modular. But despite the changes from the API, it wasn't
that different from earlier versions. The selections scales and shapes were polished,
but mostly unchanged. Doing the same thing. This is for continuity. API changes are disruptive. But I don't want to change everything every
year. I want to make a balance between doing information
and some improvements and keeping things the same. But after the release of 4.0 I wanted to think
a little bit more deeply about how not just to make D3 easier, but how to make visualization
easier. Yet in seeking to better a tool for visualization,
I remembered something. I remembered that visualization is itself
a tool. A means to an end. A means to insight, right? A way to think, to understand, to study, to
communicate something about the world. And per Ben Shneiderman, the purpose of visualization
is insight, not pictures. Think of codings to construct visualizations,
you ignore other challenges, finding relevant data, cleaning it, turning it into efficient
structures for analysis, designing that analysis, statistics modeling, simulation, explaining
your findings. And I don't mean to† or I don't wish to
down play the importance of visualization tools and innovation therein. I have many improvements I plan on making
to D3 and excited to see other approaches like VegaLite come out. And but it's important to step back and consider
complimentary approaches to related problems. Tasks supporting discovery are often performed
by writing code. And coding is famously difficult, right? Even its name suggests impenetrability. It was originally low level binary instructions
to be executed by a processer. Code has come a long way, but still hardly
humanfriendly. To give a sort of comicallydense example,
here is a bash command that I wrote for generating population density from California's census
tracks. So it looks like it starts with geo2topo. It doesn't start with this, it's shape 2 JSON,
converting it to a new line delimited geoJSON stream. It's not just bash, these are also JavaScript
expressions embedded within bash. And then, you know† anyway. I could spend probably a whole talk just going
over how this particular slide works. Now, Brett Victor give this is very concise
definition of programming, programming is blindly manipulating symbols. And by blindly, he means that we can't see
the results of our manipulation. We can edit program, rerun it, diff the output. But programs are complex and dynamic. So this is neither a direct nor an immediate
observation of the impact of our change. And by symbols, we don't manipulate the output
directorially. We operate in abstractions. They can be difficult to grasp. In Donald Norman's terms, the gulf of execution. And what are the symptoms of inhuman code? The first thing I think of is spaghetti. Code that lacks structure or modularity. Where in order to understand one part of a
program you have to understand the entire program. This is frequently caused by shared mutable
state. If you have a piece of state modified by multiple
parts of a program, it becomes much harder to reason about the value. And indeed, how do we know what a program
does? If we can't track its complete state in our
heads, then reading the code sin sufficient. We use console.log, debugger, tests. And as you have experienced, these tools are
limited. A debugger can only show a few values at a
moment in time. To see rich and complex data structures, it's
limited. And we have great difficulty understanding
what our code does. And sometimes it can feel like a miracle that
anything works at all. And despite these challenges, we continue
to write code, right? We're still writing code all the time for
lots of different applications more than ever before. And so why is that? Right? Are we masochists? Maybe. Unable to change? Probably. Is there no better solution? And in general† that is a very important
qualifier† no. Code is often the best tool that we have because
it is the most general tool that we have. I don't mean best in some sort of absolute
sense, but I do mean best for the right here and the right now and for the person that's
doing the work. And that is because code is the most general. It has the most unlimited expressiveness. And alternatives to code, whether that's sort
of highlevel, or if that also includes higher level programming interfaces and languages,
can do well in specific domains, but these alternatives must sacrifice generality for
greater efficiency in their domain. And if we can't constrain the domain, it's
unlikely that you'll find a viable replacement for code. There is no blanket replacement. As long as humans are still thinking and communicating
primarily in language. And it's hard to constrain the domain of science,
right? Science is fundamental. We're studying the world. Trying to extract meaning from empirical to
simulate systems. And it must be capable of expressing thought. Just as we don't use phrasal templates and
mad libs, we can't use a drop down menu for statistically analysis. We need more than configuration. We need to compose primitives into creations
of our own design. And if your goal is to help people gain insight,
we must consider the general problem of how people code. Brett Victor had this to say about math. But it applies equally to code. The power to understand and predict the quantities
of the world should not be restricted to those with a freakish knack for manipulating abstract
symbols. So when I talk about it being hard to code,
it's not just a question about making our work flow more convenient or more efficient,
it's about empowering people to understand the world. Now, if we can't eliminate coding, can we
at least make it easier for our sausage fingers and finitesize brains? And to explore the question, I have been building,
prototyping, an integrated discovery environment called D3 Express for explore tour data analysis,
algorithms, teaching, and sharing techniques in code and sharing interactive visual explanations. I do want to make visualization easier, but
to do that, we need to make coding easier. I cannot pretend to make coding easy. The ideas we wish to express, explore, and
explain, may be irreducibly complex. But by reducing the cognitive burden of coding,
we can make the analysis of quantitative phenomenon more accessible to a wider audience. The first principle of D3 Express is reactivity. Rather than modifying commands in a shared
state, each piece defines how it is calculated, and the run time manages the evaluation. It propagates derived state. If you have written spreadsheet formulas,
you have done reactive programming. This is a simple notebook in D3 express just
to illustrate reactive programming. It looks a bit like the browser's development
console. Except our work is saved automatically so
we can revisit it, and it's reactive. So imperative programming, C equals A plus
B copies the current value of A plus B into C, right? It's a value assignment. If A or B changes, C is the original value
until you execute a new value assignment. But in reactive, C equals A plus B is a variable
definition. That means that C is always equal to A plus
B, even if A and B change. If I'm defining A and B and updating, the
run time is keeping C up to date with all of the active variable definitions. And so reactivity means that as program author
we care only about the current state. And it's the run time's responsibility to
manage changes in state. That may seem like a small thing when you're
just adding a couple numbers, but as your program scales up, this is eliminating a substantial
burden. Now, obviously, a discovery environment needs
to do more than add a few numbers. So let's try working with data. So I'm going to load D3 and use D3.CSV to
load this CSV file here. Both of these operations here requiring the
library and downloading the file from GitHub are asynchronously. But in a reactive program, we hardly notice
this. And that's because the definitions that depend
on these asynchronous values are not evaluates until their inputs are resolved. You canny most asynchronous code as if it
was synchronous. And you can see the result from downloading
this file. And D3CSV is conservative about types, doesn't
infer types. This is a few year's Apple stock price, they're
strings. But to start working with the data analysis,
we need to convert those into more precise types. Here I'm defining an accessor or row function
to D3CSV that I can map to the strings to more specific types, or change the format
of the data if I wanted to. So the close field is a number. So as I change that, as I put the plus symbol
there, that's the plus operator. The purple strings changed into numbers. It's immediately giving feedback of the changes. And likewise, if I want to make the date into
a general date, I have to Parse that. But JavaScript doesn't understand that natively. So I need to write a function. I've called the function before I've defined
it. In a reactive program, I can write the code
in any order, and as I finish writing program, bring it up to dead. So I called the Parse time function, and now
I'm defining that Parse time function using D3 time format. Passing in that value here. Again, see it updating. And as I substitute the fields with the appropriate
percent commands, you can see that it updates and looks correct. Now that the data is in the right format,
I can start to ask questions. If I want to compute the range of dates in
the data set. But I made a mistake here. I forgot to give the data a name. So that's going to give me an error. But I'll just go in there and assign it a
name and it reevaluates the earlier command. So it becomes more much resilient to error
when it's automatically reevaluating things as they're currently defined, rather than
sort of constantly thinking about what state is your program in and how to get it in the
right state? You're always operating under current definitions. Okay. Unlike the developer console, cells in D3
express can have elements by returning DOM elements. We can turn the data into a chart. Specify the size, the width, height, margins,
the standard B3 convention. And then we can go back and we can take the
domains, the extents of our data that we've computed and use those to construct scales. So we have a time scale for X, mapping that
data to sort of X position. And similarly for Y, a linear skill, taking
the domain of the closed dimension and mapping that to sort of a vertical position. So those have changed to be scales now. And now I'm going to open up and I'm going
to create an SVG element. This one, unlike the other ones, has curly
braces on it. Sorry. Skip ahead there. So when I open up the SVG I'm going to use
curly braces so that I have sort of the ability to write an arbitrary block of code there. Not limited to writing a very short hand expression. So inside of the SVG definition I'm going
to use DOM.SVG in order to create an SVG element. That's a convenient wrapper on top of document.create. And you're working with detached DOM nodes.ed
and nodes are displayed in the browser. Starts as an empty SVG node, and start to
add structure using D3 selection. I can add an axis here, and by default that's
at the top because it's rooted at the origin in the top left corner. Then I can specify my translate function. Or, sorry, my transform attribute to move
that down. Take that code, copy it. And make the Y axis, which goes there on the
left. And so as I am making these changes, I can
sort of immediately see what the effect is on the output, right? I'm not having to sort of constantly switch
between my editor and then reloading it in the browser. Likewise, when I want to add the actual path
there to draw the line, I can add a path element. I'm going to need a function in order to compute
the geometry, so I can use D3 line and pass in the right X and Y axis by pulling out the
appropriate fields. Looks data
Driven, but that's not quite right. They fill black by default in SVG. And we will replace the fill and replace with
a blue stroke. That's a basic line chart. But you can see that the program's topology
is starting to become more complex. So this is the directed acyclic graph of references
in that chart. And that graph was itself made by D3 express
using Graphis. And there's the unnamed cell, the SVG output. A few operations of the graph. It's now trivial to take our chart definition
and make it responsive. The width, the height and the margin feed
into the scales and the SVG definition. They're currently defined as constants, but
if we wanted to make this chart responsive to the window size, we could just replace
those definitions appropriately and everything else would update. Likewise, if we want to replace the data. If we want to have like a realtime data stream
come in there, we're replacing the static definition and the static chart becomes a
dynamic chart. I'll show that. But first the difference between the imperative
and the reactive style of coding. So this is your typical D3 code that you might
see on blocks.org in some of my examples where on page load you're defining the scale. But on page load you don't have the data available,
so you can't initialize the data of your X scale. Later, after the data loads, you're defining
the scale† or the domain† of your X scale. So if you think about it, you're separating
the definition of this object into two places in your code. And you can have sort of arbitrary amounts
of unrelated code separating those definitions. If you compare that to the reactive definition,
the reactive definition is centralized because we no longer care about the order of execution
and the dependencies in our code, right? Those are now managed by the run time. So we can centralize our definition. So reactive programming is not just sort of
about making things more convenient or saving you time, it's also about getting a cleaner
code structure. And this is particularly useful if you want
to be able to reuse these definitions in another program. Right? Because your definitions are now localized
and they're easier to copy and paste or to import into other documents. So lastly, you know, online charts, despite
the name D3 express, you're not required to use D3. It's just doing DOM, JavaScript. You can use whatever library and format your
browse†supports to create your visualizations. This is a similar chart, but now using VegaLite
with the data with the nice syntax that provides rather than operating on the low-level object. If I wanted to use Canvas, like in this case,
we want to make a globe. I've loaded TopoJSON, some topography of world
country boundaries. And now create a new block for the canvas
and get the context. If I return that canvas, it's displayed. But I can draw to the canvas to get the display. This is a standard way, might use D3 geo,
where I have a Geopath, and pass in the context. And here it's defined as your fixed geo orthographic
projection. And I can add another outline which you'll
see in a second. But I want to showcase another powerful feature
of reactive programming. Which is I can take a variable definition
that is static like this fixed orthographic projection, and replace with a dynamic definition,
like a rotating projection. And I can do that using standard JavaScript. Which is I take the static definition of this
projection as just a geo orthographic, and I can replace it with a generator. And†you're seeing Mercator. And it's updating automatically. Put a star in front, I'm creating a generator
that can yield multiple values rather than a constant. If you have used them before, this is relatively
new. But it's really cool and a lot of fun to sort
of take a static definition and replace it with something dynamic like this. So the way that that's working is that the
run time is pulling new values from the generator 60 times a second. So the generator is kind of† it runs once
and then it basically gets suspended until the run time is pulling a new value from it. And the way that the code works is that when
it's pulling a new value, it's setting in the new rotation angles on that projection
and returning it. And, again, because the run time understands
sort of the relationships between all of these different variables, it knows to recompute
the canvas whenever that projection changes. It's really easy to stake a sort of a static
thing and then add a scripted animation on top of it like we have here for this rotating
globe. But one of the things that's going on here,
you may not have noticed, it's throwing away the canvas and creating a new canvas each
time that it's rendering. That can be expensive to throw it away and
restart it. It's nice that it works by default. But to optimize it to make it faster, 3D express,
you can refer to a previous value of a variable as this. This case, I'm changed the canvas definition
to use the existing canvas rather than creating a new one. And you can see it starts to smear because
it's just drawing it every time rather than having a blank canvas that it's drawing on
to. But, of course, I request then change the
code, I can clear the old value, I can fix the sort of context line. And the result is you can have these very
efficient animations, right? There's little overhead, you don't have to
worry about managing all of that state yourself. Again, to look closely at the code, this is
the static definition for it. It's a geo orthographic instance. And the rotating projection. It's a block state in the star in front so
that we can yield values. And we're going into a true loop that sets
the projection.rotate and yields that value. That's it. We basically didn't have to change anything. So if generators are good for scripted animations,
what about interaction? Generators can do this too. And the way that it works, you need a promise
that resolves whenever there's new input from the user. I'll illustrate by going back into this sort
of rotating projection. If we want to make this into an interactive
rotation, first thing we need is a widget, slider for the user to drag. We can do that again by creating the right
DOM element here. DOM.range is just, again, like document create
element. It's an input element, min value is 180, max
180. You can do it by hand, but nice to have the
syntax. We can give it a name and write a generator
that can yield the new values whenever you're dragging with it. I can write that by hand, but it's common. So there's a built in generator, generators.input. That's going to do it for you. Listens for inputs on the DOM element and
yields the new value. Now when I'm dragging it, I can see the value
in notebook. Another name, take that angle and plug into
the rotation of our projection. Now when I drag the slider, goes back and
forth. You can see that the globe is rotating. Now, again, this is such a common thing to
do, there's a shorter syntax for that where you can define both the graphical interface,
the DOM element that's being displayed, and the element that's exposed to code. That's what the view of operator is. And it's two definitions in one. They work exactly like the definitions you
saw, it's a slightly more concise definition of that. This is the long form definition of a projection. We have the projection, we have the angle
coming from the DOM element, the input range, and this is the short hand syntax for the
exact same thing, use view of angle. But the cool thing about this is that we now
have the ability to sort of create arbitrary graphical interfaces and design the appropriate
sort of programming interface, like the values that get exported along with that. You're not limited to just sort of sliders
and drop down menus and some fixed palette of user interface. I'm going to create a complex compound input
to make a color picker. This is a form here with a table inside of
it. And there's an input for the hue, saturation,
lightness. This is just using DOM.HTML. Takes a big string and sets the inner HTML
of a div and returns that so that I don't have to create all of this stuff in JavaScript. I'm just kind of embedding an HTML fragment
in my code. And then likewise, like here, is where I'm
sort of defining how this is going to be exposed to my code. I'm setting what the value is on the top-level
element. And so that's going to be updated whenever
you drag the slider. Whenever there's input event. And it's defined here to create a D3Q instance,
and also sort of updating the outputs to go along with the inputs so you can see what
the values are in this available below. When I drag the hue slider, both updating
the hue angle and emitting a new color displayed in the cell below. And that is used to set the background color
of the div so we can see what the color actually looks like. Okay. Now, that's sort of a toy example, again,
of sort of defining a custom interface. Where this starts intersecting with visualization
starts to make it much more interesting. So this is a histogram showing the return
of a few hundred stocks over a five year period. You can see that it's like a bell curve. The mode of that is slightly greater than
one. Expecting a positive annual return. But there's also this long tail of stocks
that did really badly, and a long tail of stocks that did really well. Now, in another environment, it might be difficult
to sort of inspect this directly, right? The visualization can kind of be a dead end,
where you can see it, but if you wanted to know what the data points behind these individual
bars, you would have to phrase that as a new question in code. And the goal of D3 express is you can sort
of quickly augment these visualizations so we can start to manipulate them directly. Getting back to what Brett victor said. We treat it like an input like with the slider
and the table. So now when I'm brushing back and forth, just
using the standard D3 brush, it's actually exposing its current selection as the array
of data that are there. And I'm just displaying that sort of using
the default inspector. But it's good enough to let me know sort of
what is in the selected range and I can drag that back and forth without doing any sort
of work. And just to show there's no real magic going
on behind the scenes here, this is the code that adapts your D3 brush, not a special version,
but the same D3 brush you're using today, and fitting it into this new framework. Where you're receiving your brush event. You're looking at the brush selection. And then you're pulling out the X and Y values. Basically, like the start and the stop. And then you're filtering your databased on
those values. And then you're setting the property† this
is what the value property of your node. That's what gets exposed to the code. And then you're just telling the generator,
or you're telling D3 express that the value is updated by dispatching an inputted event. Now, another thing that you can do in D3 express
that's useful is, by default, these reactions to your code are going to be applied instantaneously. Whenever a variable changes, the run time
knows what depends on that and it's going to recompute all the drive variables and update
the display instantaneously. Sometimes that's not what you want. You have something that changes and you want
to be able to observe what's changed. So we want to use these animated transitions
to have consistency. I have rewritten in a way to use the D3 data
join. And it's staggered so when the data updates
it's going to move the bars into the new positions so you can see how the values change. And likewise, like this data set here, the
frequency of English letters, is define sod that you change by value flag, it's going
to be sorted either by descending frequency, or lexicographically. So now when I go back up to the top and I
change the value of this by using the check box, the code can sort of apply a transition
from the old values to the new values. So you can use sort of access to the previous
value both to improve performance, and to get better visual output, because in the reactive
system you can kind of opt into controlling how these changes get applied. So, again, it's sort of like optin complexity
as you want to add richness to your implementation. So what was a pretty whirlwind tour of reactive
programming in D3 express. But you saw you to use the inline output and
cells to look at current state of the program. But I want to dive into this a little bit
more and show you how to use visualization in D3 express to improve our ability to scrutinize
a program's behavior. So reactive programming, where you can sort
of change the studied and immediately see how it updates is also known as interactive
programming. And interactive programming lets us investigate
how it works by poking. Change it, delete some code and reorder it
and you can see what happens. Let's you get a sense of how that individual
code is contributing. You're doing a more direct observation of
how that code impacts program. So in this notebook I have the directed graph
of data. And it's I can add or remove the charge force. And the charge force is causing the nodes
to repel, right? Otherwise if you remove it, they sort of collapse
down into the center where the only force that's really applying to them is the link
force. Likewise, I can modify parameters of the set,
I can set the strength to be 100. They're attracting each other rather than
repelling each other. Collapse to the center. And make it negative, and they expand out. Fifty, 100, whatever. It's not reloading the entire page when you're
making these changes. It's a reactive sort of topology. And so when I'm changing the definition of
these forces, it's not sort of throwing everything away and starting over. It's just operating on the current definition
that's running. You're doing live editing of the program. And so that sort of improves the stability
and lets you see more easily how these changes are contributing to the program's behavior. So likewise, like if I take out the link force
here, they're sort of no longer connected to each other and start spreading out. Or if I take out the centering force, then
it can sort of start floating away. Bye. [ Laughter ]
Okay. Now, a more explicit approach to studying
program behavior† so rather than sort of tinkering with it† is to try to expose its
internal state. And I'm going to illustrate this by showing
the computing of a round. And we can take a value and turn it into a
generator and now yields values in addition to the normal return value. And the idea is that the values that we're
yielding as program is running give us an idea of what the code is doing when it's running. The nice thing about both yield and return
is you can essentially take sort of arbitrarily complex functions, at least if they're not
already generators, even recursive ones, and you have this extra channel now where you
can expose the internal state of your program. And that's really useful for visualizing a
program's behavior or studying a program's behavior, because it allows you to completely
separate your visualization or analysis of the behavior from the code itself. Like if I take some code, as I have done before,
and put my visualization code directly within this algorithm, it starts to become a mess. If you're doing canvas draw stuff and using
a debugger and switching between the algorithm and the canvas stuff, gets complete chaos. But if you have this approach, you can extract
the data while it's running, using generators, or statically building it up as an array of
values. Then it's much easier for you to do that analysis. In D3 express, this is how to do it. The simplest way is to call the function like
before. And because it is now returning a generator,
you automatically get an automation. D3 knows when it's a generator, it's going
to pull a new value every animation frame. But also like this, a spread operator, where
you're essentially pulling all of the values out of the generator at one go and putting
into an array. And that's useful if you want to do static
visualization or interactive vision variable where you're scanning in between individual
frames. You don't need to study a running sum function. I think we all know how that one works. But I want to use a more realworld example. And this may get a little bit hairy. But I'm going to try it anyway. Looking at the circle packing in D3. This is the flare class hierarchy. And hierarchical circle mapping is like tree
maps, but you have this extra wasted space because you're nesting circles rather than
squares. That extra space is not wasted, it indicates
the hierarchical structure that's not always obvious with the tree maps. In order to produce these diagrams, you need
the individual circles. The set of siblings in part of your tree. And so this is sort of a little example of
how that works. Right? You have a set of circles that you want to
pack in order, one at a time, into a small a space as possible without overlap. Sort of like penguins huddling in Antarctica. Your job is to place one of these circles
at a time until you have placed all of the circles. And you want the circles to be packed as tightly
as possible, each new circle that you place should be tangent to at least one circle that
you have already placed. Two of the circles. But if you pick an existing circle at random
as you're replacing the new circle as your tangent circle, you're going to waste time
putting that new circle into the middle of the pack where it's going overlap with the
other circles you have put down. Ideally when you're considering the tangent
circles, you should only be considering the circles on the outside of the pack. But the problem is, how do you determine which
circles are on the outside. So the algorithm used by D3 and other implementations
with this lay out maintains a front chain. And that's the red line. And the front chain represents the outer most
circles. So when you're placing a new circle, it picks
the circle on the front chain that is closest to the origin. And the new circle is placed tangent to this
circle and its adjacent neighbor. And if there's no overlap with other circles
on the chain, it can move on to the next circle. But if there is overlap like in this case
here, where the big circle overlaps with the other circles on the front chain, then it
needs to cut the front chain so it can choose a different pair of tangent circles and effectively
move that circle to the outside. If you look closely at the animation, you
can see the moments where it's cutting the front chain as the larger circles get squeezed
out of the pack and pushes down. I find this mesmerizing to look at. But more than being sort of eye candy, this
animation and notebook was extremely helpful for me in fixing a longstanding bug in D3's
implementation. Where there was a little bit of vague wording
in the original paper, and it wasn't obvious in some situations of where to cut. Where you have a circular structure. In some cases, it's not obvious where in the
front chain it needs to be cut in order to place the new circle. And having the ability to sort of inspect
program to see what it's doing as it goes along, instead of the output and it being
wrong. It was easier to find the bug and change the
algorithm and see how it affected the conditions without changing the output. This is one part of circle packing. The other is once you have laid out your siblings,
you need to compute the enclosing circle for that pack so you can then move on to other
parts of the hierarchy. The conventional way of doing this is you
sort of scan the front chain and pick the circle farthest from the origin. That works well because the parks are roughly
circular. But sometimes the packs can be slightly shifted
off to the side. That doesn't end up being an exact solution. And I learned that there's this algorithm
called Weltzel's algorithm that gives the optimal solution and runs in linear time. There's no reason to not do that. But if I can do it once and improve, even
with a slight improve, that's awesome. And it's fun to understand how these things
work. The algorithm is incremental. It's working on one circle at a time and works
in random order. And you can see just in this animation that
it sort of very quickly, because of that approach, converges on to roughly the right and closing
circle. But there is a chance as you get the circles
on the outside that it has to expand. So how does this algorithm work? One thing I should say is this is a sort of
a slightly harder problem that I'm showing than what happens inside of the circle packing
layout. Because in circle packing it already has the
front chain, so only needs to compute the enclosing circle of the front chain. But this is the case where you have an arbitrary
set of circles and don't know the front chain ahead of time. How does this algorithm work? Let's assume we already have an enclosing
circle for some set of circles. For circles zero through I minus one. And we assume we know the enclosing circle,
but this is sort of how math works, induction works, how we can start to build an algorithm
as a starting point. If we assume we have an enclosing circle for
some set of circles, and all we want to do is incorporate the next circle into our enclosing
circle, well, if that new circle, which is the black one here, is already inside our
enclosing circle, then we don't need to do anything. The circle is fine and we can move on to the
next one. But If the circle we're trying to add is outside,
it's not contained by the enclosing circle, then we need to compute what the new enclosing
circle is. But we can actually make an observation about
the new circle. And that is if this circle is outside the
enclosing circle, we know that it's the only circle that's outside the current enclosing
circle. And that means that the new enclosing circle
must be tangent to the old circle that we're replacing. So that looks like this. But the problem is we don't know what the
other two tangent circles are. They might not be the same as the previous
enclosing circle. But if we know what one tangent circle is
to the enclosing circle, we can look at recursively. Every time we have one outside, we recurse
to find the next tangent circles. And some other boundary conditions like what
the enclosing circle is when you have one, two, three circles. It's called Apollonius' problems with this
is geometry. This is already enough. You can get an idea how the algorithm works
and why it's able to terminate. Now that we understand the recursive structure,
we can make a visualization that shows a more complete view of how the algorithm works. From left to right, the four possible depths
of the stack. Like it can't reoccur more than three times
because you can't have more than three tangent circles or you contain all the circles. Again, that's geometry. Before it was just the circles on the left. But you're seeing as it has one of these circles
that's outside of the red circle, it's going to add the new tangent circle and start descending
into the recursive approach. But in addition to showing you how this algorithm
works, one of the nice things is you can get a better sense of how much time the algorithm
spends in different states of the program. And like in this case, you know, you see the
enclosing circle gets bigger very quickly, but you can see whenever it finds sort of
a circle that is outside and needs to recurse, it needs to revisit all of the previous circles
to make sure the new enclosing circle actually contains everything. I'm not going to prove that it's linear time. That's too much work. But anyway, you get kind of a sense here. Anyway. All right. One way to write less code is to reduce it. And the 440,000 published sort of attest to
this approach. But libraries are an example of active reusability. You must design a library to be reusable. This is a substantial burden. It's hard to define reusable abstractions. Ask any monitor. And oneoff tide, like the D3 examples, it
much easier. You're only worried about the task at hand. You don't have to generalize into an abstract
class of tasks. But with D3 express, I'm trying to explore
if there's an intermediated with better passive structure. You can use the documents to more easily repurpose
code. And one part is you can treat any document
like a lightweight library. Had this document, defined a color interpolation,
and a pretty gradient. And now import that into this document and
I'm going to call it. So if I had another color function or utility,
I don't have to create a package on NPM or by the GitHub repo or whatever. Just pull it into the code. And it's pulling in the dependencies automatically. The original definition was D3† a plugin. And I don't have to load that separately. But likewise, even though I'm loading that
definition of D3 in the remote document, it's not going to conflict with the local definition
of DP. I'm pulling in the functionality. I'm only pulling in the symbols that I explicitly
reference in my import statement. You can do cooler things, you can rewire the
definitions to inject your local definitions into the remote definition. Excuse me. So this is a case where I have a data set
that is going to be streaming over a Web socket. I'm not going to explain how all this code
works. I think the idea is you would have sort of
an API for connecting to a socket and getting a realtime data stream. It just has an array of values which you keep
the last 60 seconds of. The result is this is a generator. It's going to emit a new array of object with
time and value. So the question is, can I visualize this using
an existing line chart? We had the line chart with Apple stock price. The same structure. This is a slightly different definition. It has a time and value. So here I'm embedding the chart. You can see it's just that same sort of basic
chart. But if I add this width clause here, I'm injecting
my definition of data into the chart, right? That static chart becomes a realtime chart. And I didn't have to change any other aspect
of the code. Because that code was already defined to be
reactive, right? Setting the domains of X and Y. But the cool thing is I don't have to stop
there. I can also augment sort of other aspects of
that chart definition. Because they're all part of this topology
which is exposed and then I can override it. So if I don't like the fact that the Y scale
is dynamically adjusting based on the window, and I want to have the fixed window because
I know the expected values from my streaming data set†we can do that. And other definitions like the width and height
and margin. And inject the Y definition. That changes so it's now fixed scale. And likewise, I can do the same thing with
the X scale. So rather than sort of showing you the chunks
as it's updating. I think it updates four or five times a second
here. I can have a smoothly sliding X scale that
updates 60 times a second and crops the data so that it's slightly outside the window. And it doesn't care whether the X scale is
a constant or a generator. I can just plug those things in. All right. So†
All right. So notebooks in D3 express. They run in the browser, right? Not in the desktop and not in the cloud. There's a server to save your edits. But all the rendering and the computation
happens locally in the client. What does it mean to have a Webfirst discovery
environment? In my view a Webfirst discovery environment
embraces Web standards. Vanilla JavaScript and the DOM, today's open
source, snippets on the Web or libraries you're getting from NPM. Limiting the specialized imagine you need
to be productive. There's syntax for reactivity. But I have tried to keep that as small and
familiar as possible. So using generators to define sort of these
dynamic values. These are all of the forms of variable definitions. And they're just, you know, expression, block
statement, sort of a generator block statement, and your standard sort of function definition. Probably more important, though, is that your
code can now run everywhere, right? So if it can run in your browser. If it's using Web standards, it can run in
anybody else's browser. There's nothing to install. And that means it becomes much easier for
others to repeat and validate your analysis, right? And by extension, your code for exploration
can gracefully transition into code for explanation. You don't have to start over from scratch
if you want to communicate your insights. It's great and I want to commend journalists
and scientists for increasingly being open and sharing that their data and code. But putting code up on GitHub is not necessarily
enough to make it usable. It's a lot of work for those who want to use
it to make the environment. Need the right software installed, and familiarity
with the tools that you're using. But if the code is running in the browser,
again, nothing to install and sort of works by default. So, again, maybe you should have gotten Brett
to give this talk instead. Going to end on another Brett Victor quote
from the explorable explanations to explain sort of the implications of this approach. So an active reader asks questions, considers
alternatives, questions assumptions and questions the trustworthiness of the author. An active reader tries to generalize specific
examples and devise specific examples for generalities. Doesn't sponge up information, but uses the
argument as a spring board for critical thought and deep understanding. Imagine if our algorithms are not just in
prose and PDFs, but shared in live code and shared with interactive visual diagrams. It's easier for the reader to look at how
they work and question and tinker with it and make modifications with it. I want to end on a slight disappointment. This is a lot of work. And it's not ready for you to use yet. But I hope it's going to be ready very soon. You can actually sign up for like an early
access thing when it's available on D3. express. That's the URL. But if you are interested in this stuff, please
come talk to me about it. If you want to help me build this so it's
available sooner rather than later, please get in touch. And thank you. [ Applause ]