a base-neutral system for naming numbering systems

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
one of my interests is alternate numbering systems. I’ve talked about them in a few videos, but one thing I haven’t really talked about in a video yet is my original method for giving names to numbering systems. there’s a full description of this system that’s been on my website for years at this point, but given that I’m currently simultaneously working on like five distinct video projects and I don’t think I’ll get any of them done in time to be the big video I put out for this month, I figured I might as well make a quick little video about this, and maybe some of you will find it interesting. almost everyone uses base ten to count things. that’s usually written like that, “base one-zero”. and if you’re used to seeing numbers written in base ten, writing “base ten” like “base one-zero” is completely normal and understandable. but the thing is, every base is “base one-zero”. by definition, in any given base, one-zero is the way you write the base itself. what it means to be base six is that one-zero means six. when you’re talking about different numbering systems, the fact that every base is base one-zero makes this way of referring to bases extremely ambiguous. you need to specify what base you’re using to talk about a given base. but since the base you’re using is also base one-zero, you need to specify what base you’re using to talk about that base. there’s easy boring solutions to this. you could just write out the English name of the number in full, or just have everyone agree that base ten is the default. but that’s not ideal, especially in the context where people are talking about using other bases as alternatives to base ten. what you’d like is a way of talking about bases in a way that is itself base-neutral. and so, bases are given names. Latin-derived names, typically. there’s binary, ternary, quaternary, quinary, senary, septenary, octal, nonary, decimal, undecimal, duodecimal, and so on. these names are well-established and commonly used. but the thing is, this still isn’t really base-neutral, at least not any more base-neutral than just using English words for numbers. like English, Latin uses base ten, and Latin names for numbers are rather decimal-centric. for example, base twelve, duodecimal, is a rather popular numbering system, and many people who like base twelve specifically dislike the name “duodecimal”, since it represents twelve as “two plus ten”. if you’re trying to promote a numbering system where twelve is written as one-zero, calling twelve two plus ten doesn’t feel right. for this exact reason, the name “dozenal” is commonly used instead, and it’s a very good name. while etymologically it is technically still from something that at one point meant “two plus ten”, it’s a much more fitting name for base twelve than duodecimal. so, that’s the reason I came up with my own system for naming bases a few years ago. I wanted to create a truly base-neutral method for referring to different numbering systems. it’s a little bit quirky, and not exactly the most elegant possible system, but I’m still happy with it, and I think it’s a lot of fun. for bases where the existing Latin name isn’t super explicitly decimal, their names can just still be from Latin. this includes all the bases up to ten, plus vigesimal for base twenty and centesimal for base one hundred. there’s also a few where instead of the more traditional Latin name I went with an alternative that I think is easier to understand, specifically trinary instead of ternary for base three, and seximal and septimal instead of senary and septenary for bases six and seven. seximal for base six in particular is one that I feel very strongly about. as you may know, base six is my personal favorite numbering system, and I really don’t like the name “senary” for it. unless you’re already familiar with the intricacies of Latin number words, “senary” just doesn’t sound like it has much to do with the number six, nor does it really sound like it’s the same type of word as “binary” or “decimal”. what it sounds like is “scenery”, a completely unrelated word that just happens to be spelled kinda similarly. and like, the last thing I want is for people to hear me talking about base six and think I’m actually talking about something completely unrelated. so that’s why I call it “seximal” instead. on top of that set of Latiny names, this system also incorporates a couple names for bases that are already well-known, namely dozenal for base twelve, and hex for base sixteen. “hex” is short for “hexadecimal”, a mixed Greek-Latin name that’s very base ten, but the shorter name “hex” is a commonly used alternative that’s just base-neutral enough to fit in with this system. I also incorporated the name niftimal for base thirty-six. this is from the Ndom word “nif”, meaning thirty-six, which is also used in my English-language implementation of seximal as the word for thirty-six. “niftimal” also happens to sound a bit like the word “nifty”, which is very fun. as I’ll talk about soon, the core of this base-naming system uses factorization, and as such prime numbers are pretty important to it, so the final set of root names are used specifically for prime numbered bases. base eleven is elevenary, obviously from the English word “eleven”. base thirteen is baker’s dozenal, as in a baker’s dozen. and finally, base seventeen, a base which would be very impractical to use, is called suboptimal. some people have pointed out that this name kinda sounds like it’s saying that base eighteen is the optimal base, which isn’t the intention, but oh well. anyway, as I was saying, outside of these ones that get their own simple root names, bases are named according to factorization. naming numbering systems according to factorization is really useful, because what factors a base has can tell you a lot about its properties. like, two and five are easy numbers to deal with in decimal, because ten is divisible by two and five. so uh, the first one that doesn’t get its own root name is base fourteen, right after baker’s dozenal. fourteen is two times seven, so it’s called biseptimal. you use the prefix bi- to multiply a base by two. then base fifteen, three times five, is triquinary, with tri- meaning times three. great, so then there’s hex, then suboptimal, then base eighteen. so, eighteen is two times three times three, so you could call it bitritrinary, or you could call it two times nine, binonary. so, since numbers can be factorized in multiple ways, there needs to be some way of figuring out which way you’re supposed to do it. the solution I came up with is that you want to factorize the number into whatever factors are closest together, and put the smaller one first. so, for eighteen, that’s three times six, so it’s triseximal. next is base nineteen, which is a prime base. all smaller prime bases got their own roots, but there needs to be some way of deriving names for prime bases that are arbitrarily large. so, what you do is you take the base one below it, and add the prefix un- to mean “plus one”. so, base nineteen is untriseximal. this might seem like a pretty arbitrary way to do it, and of course it is, all of this is arbitrary. but the benefit of naming prime bases after the factorization of the number one below them is that just like how a base’s factors determine some of their properties, the factors of the number right below a base also determine some of the base’s properties. for example, in decimal, while two and five are the most convenient numbers to deal with, three is also pretty convenient, and that’s because it’s a factor of nine, one less than ten. next is twenty, vigesimal, gets its own root; then there’s twenty-one, triseptimal, three times seven; twenty-two, bielevenary, two times eleven; twenty-three, unbielevenary, prime base so it’s one plus two times eleven; then twenty-four can be factored a few different ways but the one with the factors closest together is four times six, so it’s tetraseximal, with tetra- for times four. now’s a good time I think to talk about the rest of the multiplicative prefixes, since I think you get the point. so, every root name also has a corresponding prefix. in general, these prefixes are from Ancient Greek, like how “hexadecimal” combines both Greek and Latin number words together. some of them aren’t though, like bi- for two is a Latin thing. but yeah, bi- is two, tri- is three, tetra- is four, penta- is five, hexa- is six, hepta- is seven, octo- is eight, enna- is nine, deca- is ten, leva- is eleven, doza- is twelve, baker- is thirteen, tesser- is sixteen, mal- is seventeen, icosi- is twenty, feta- is thirty-six, and hecto- is one hundred. another thing that makes naming bases according to their factors a useful thing to do is that when you’re dealing with a really large base, it’s somewhat impractical to actually have unique symbols and names for every single digit. I talked about this in one of the videos I did with Artifexian about number systems, but one solution to this problem is mixed radix. this means that the factor that determines the value of each position in a positional notation relative to adjacent positions isn’t always the same. the go-to example for how this works is a digital clock, where some digits are ten times the value of the digit to the right, but others are six times the value of the digit to the right. this is using base six and base ten together as an intuitive way to use base sixty without actually needing sixty distinct digits. and, since in my system bases are named according to their factorization, the name itself tells you which radices you can mix together. following the algorithm, the canonical factorization for sixty is six times ten, the exact pair of radices used by digital clocks. so, we just take base ten, decimal, and add the multiplicative prefix for six, hexa-, and we get the name “hexadecimal”. wait. okay, so, the problem we’ve just run into is that if you add a multiplicative prefix to “decimal”, it’ll sound like you’re using the traditional base-naming system to talk about a completely different base. that’s pretty confusing. so to avoid that, when you put a prefix before “decimal”, it becomes -gesimal instead, so base sixty is hexagesimal, not hexadecimal. the other base that’s different when there’s a prefix before it is “baker’s dozenal”. while this is a very good name that I like a lot, it can be a bit cumbersome and awkward to add a prefix to it. so, when there’s a prefix, “baker’s dozenal” gets shortened to just -ker’s dozenal. so, for example, the Excel column numbering system, base twenty-six, two baker’s dozen, is called biker’s dozenal. now for prime numbers. as before, these multiplicative prefixes are derived from the factorization of the number one less than the given prime number. instead of un-, you use the Greek-derived hen- to mean “plus one”. so, the multiplicative prefix form of untriseximal is hentrihexa-. however, this has the risk of being ambiguous. for instance, what would base 646 be called? it factorizes as nineteen times thirty-four. thirty-four is itself two times seventeen, so base thirty-four is bisuboptimal. adding the multiplicative prefix for nineteen, hentrihexa-, we get hentrihexabisuboptimal. but while we know that the hen means to add one to some portion of the prefix before suboptimal, it’s unclear exactly which part. while you could figure it out by looking at the different possibilities and seeing that only one of them follows all of the rules of this system, you shouldn’t have to do that. you could rearrange the roots of this name to make something easier to understand, as in bihentrihexasuboptimal, but there’s no guarantee that you’d always be able to do that, and it’s an inelegant solution either way. so what you actually do is you add something to the end of a big prime multiplicative prefix. having spoken parentheses like this is a kinda unnatural thing to do, so to make it as painless as possible, the close-bracket particle is -sna-, which is very fun to say. so, the prefix for nineteen is hentrihexasna-. you could in theory have a system for working out when this close-bracket particle is actually necessary, but I think just always using it no matter what is fine. and that’s almost everything. there’s just a couple more minor rules for optimizing stuff. as I said before, what you do when you’re factorizing a number is you go with whatever two factors are closest together. usually, those factors will themselves need to be broken down, and you just do that recursively. so like, ninety-eight is two times seven times seven, but it’s factorized as seven times fourteen, and then fourteen is two times seven, so base ninety-eight is heptabiseptimal. okay, but now let’s say you wanted to use something like base six hundred. six hundred is twenty four times twenty five. twenty four is four times six, tetraseximal, and twenty five is five times five, pentaquinary, so base six hundred should be tetrahexapentaquinary. but the thing is, that name is way longer than it really needs to be. why bother calling it four times six times five times five when this system has a specific root word for four times five times five? just call it six times one hundred, hexacentesimal. so, what you actually do when selecting how to factorize a number is first try to minimize the number of roots you need to use, then pick the factors that are closest together. so, even though twenty four and twenty five are right next to each other, the fact that six and one hundred both can be expressed with single roots in this system means that that’s prioritized. uh, there’s also some rules for dealing with vowel sequences that show up when one root that ends with a vowel is followed by another root that starts with a vowel. here’s the whole table for what to do with those. all of this stuff is designed so that in theory you could for any given number work out its name in this system by hand, but I also have made stuff that automatically figures out these names. outside of this set of roots, there’s a few extra ones that extend the range of this system to include more types of number. while the smallest base that’s really functional is binary, base one still kinda works, and it’s called unary. base zero can also be described in theoretical terms, but it would be literally impossible to use in any capacity. nevertheless, it’s called “nullary” by this system. this extends the range of numbers that work in this system to include all natural numbers. negative bases can also sometimes be useful, so they have names generated using the prefix nega-, following the established convention of negabinary and negadecimal for bases negative two and negative ten. this extends the range of numbers that work in this system to include all integers. now, outside of the domain of bases that are sometimes useful, there’s non-integer bases. these are incorporated using the prefix vot-. vot- is used for the reciprocal of any integer base. for example, votdecimal is base one tenth. as you might have guessed, this is a reference to the language Vötgil. in Vötgil, numbers are represented in decimal, but with their digits read backwards. this could be interpreted as Vötgil using base on tenth, hence votdecimal. these vot- bases can then have the usual multiplicative prefixes added to them to represent any rational number base. so like, base two thirds is bivottrinary. finally, there are the suffixes -nary and -imal. these suffixes can be added to the names of any number, with -nary used for numbers less than six and -imal used for numbers greater than six, which makes it so that this system can work with any number which can be named. and that is the full system. there’s definitely some parts that are kinda awkward to use, and it’s not exactly the most mathematically elegant notation, but I still kinda like it. there’s definitely some things I think I would change if I were to go back and remake this system from the ground up now, but I’m still happy with it. now, one thing you’ve definitely noticed is just how long these names can get. the reason people who commonly use base sixteen call it hex isn’t so it can have its own base-neutral name, it’s because “hexadecimal” is just too long to say every time. so, to go along with this base-naming system, I’ve also come up with a system for abbreviating these base names. unlike the base-naming system itself, this is not designed to be an algorithm a human could do by hand. I haven’t really described anywhere how exactly this thing works, so to close out this video I’ll do that here. broadly speaking, this algorithm cycles through every possible abbreviation for the given base, and picks whichever one it encounters first that doesn’t conflict with the abbreviation for any smaller base. and uh, here’s exactly how it does that. first, it checks if the number it’s given is a positive integer. anything else is outside of the scope of what this algorithm is supposed to work with, so if it’s not a positive integer it just stops here. if it is a positive integer, it then looks up if the abbreviation for this particular base has already been saved to memory, since it would be a waste of time to calculate it again. next, it figures out the given base’s name using the process I’ve been describing throughout this video. it then rewrites that to be in all caps, and removes any spaces or apostrophes, specifically so that baker’s dozenal and the like don’t get abbreviations that include non-letter characters. next, it removes any vowels that appear in the name outside of the first three letters. the letter Y at the end of many base names is not removed at this step, even though it’s a vowel. the reason for doing this is that vowels that appear later in these base names aren’t that distinctive, and including them in an abbreviation wouldn’t result in the most intuitive abbreviations. the string which will be the final abbreviation is created, by default being just the first three letters of the base name, which we already had to figure out in that initial vowel removal process. also, an integer k is initialized to zero here. one more number to start keeping track of is the current target length for the abbreviation, “abbvLen”, which is set to three. we are now ready for the main loop of the algorithm. first, check if the current abbreviation is used by any smaller base. if it’s not, we’re done and can move on. otherwise, keep going. now, we take that number k and convert it to binary. we can now use this binary number and our trimmed base name to generate an abbreviation. you take the binary number with its bits ordered from least to most significant, and line them up with the string you’re abbreviating, with that least significant bit lined up with the second character in the string. then, for each bit, if that bit is one, you include the corresponding character in the abbreviation. for the first character, the one that doesn’t line up with any bit, that one is just always included. next, we take that abbreviation and check if its length is “abbvLen”, the length we currently want the abbreviation to be. yeah, this part is pretty inefficient. what I should have done when I made this all those years ago is have it check if it’ll be the right length before doing all that working out and actually generating the abbreviation. this wastes a lot of time. oh, right, one more case to worry about. if it turns out that k is too big for that process of lining up the binary number with the string you’re abbreviating to work, then we add one to abbvLen and reset k to zero. for some reason, which might have just been a mistake, the way it checks if k is too big for this actually says that it’s too big if there’s a one corresponding to the final character in the string? so, because of that, these abbreviations just never include the final letter of a base name. I legitimately don’t remember if that’s intentional or not. it probably wasn’t. anyway, now that we have our abbreviation, we add one to k, then go back to the top of the loop. is this abbreviation used by any smaller base? if not, we’ve found the abbreviation, but if it is, we now find the abbreviation corresponding to the new value of k. once we finally have our abbreviation and exit the loop, we store it into that hash map so we can look it up later, and then return it. so, that’s how abbreviations are generated using this pretty messy code I wrote a long time ago. I went through and checked for every base up to base ten thousand, hectocentesimal, just to get some stats on how short these abbreviations are. of those, 29.55% were three letters long, 68.49% were four letters long, and the remaining 1.96% were five letters long, which I think is a pretty good result. and also, even though the majority in that particular range were four letters long, the smallest base that needs four letters is bakerpentanonary, base 585, abbreviated as BAKR. the bases people would actually consider using are way smaller than that, and they all get nice three letter abbreviations. and also, bases that do get three-letter abbreviations are found throughout the whole range of numbers I checked, all the way up to EVC for ennalevuncentesimal, base 9999, and there’s three letter abbreviations for names as long as the thirty-five-letter long henbihenbilevasnasnaheptatriquinary for base 4935, which gets comfortably abbreviated as HVQ. HVQ is the most efficient abbreviation in the range I checked, but the least efficient is base sixteen, hex. since “hex” is already three letters long, and since these abbreviations can’t be shorter than three letters, its abbreviation is the only one which isn’t any shorter than the thing it’s abbreviating. if you exclude three-letter abbreviations, no abbreviation within the range was any worse than one third the length of the full name for the base, seen with FETN for base 6948, fetundozahex, and HECTC for base 10,000, hectocentesimal. while this system for generating abbreviations is very fun, and does in fact create effective shorthand names for different bases, a major issue with it is that there’s no way of working out what base a given abbreviation refers to. the other thing is that my code for this is just really inefficient, and has lots of room for optimization. it has a tendency to get stuck for a really long time whenever it has to do an abbreviation that’s more than four letters long. anyway, that’s pretty much it. I wasn’t kidding before when I said I was working on like five other videos right now, so look forward to at least one of those in the coming weeks.
Info
Channel: jan Misali
Views: 133,512
Rating: 4.9162135 out of 5
Keywords: jan Misali, math, numbering systems, seximal, every base is base 10, decimal, dozenal, hexadecimal, programming
Id: 7OEF3JD-jYo
Channel Id: undefined
Length: 17min 45sec (1065 seconds)
Published: Thu Jun 24 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.