Mark Jason Dominus – Historical how and why of hash tables

hawks and I didn’t actually get to do any programming of anyone and that is a regret just think about to be out of the way I would really like to parents of all such like talk about bangs I’m not going to be giving us the whole time let’s see okay right so why are we doing hash tables I got to make a small confession hear that so here’s what I was presented with by was discs yeah once i’m talking with technical talks about 20 to 30 minutes long usually address something they made the speaker a better programmer i’m not a search requirement going through your cough they’re more typical nature so I okay bye and then I started thinking what is something that made me a better programmer and the answer came instantly his suffering but it’s no it’s not a technical nature so okay so I’ve dumped that and then the next idea came almost immediately afterwards oh the perfect talk i will talk about it which has had an immeasurable effect on my programming and i spent like three years studying sicp and absorbed as much as i could and it’s totally technical except the three years does all between like 12 and 15 years ago and although all that stuff’s still in me i don’t remember anymore what it is I would have had to go back to a book and reread it oh yeah this and this and this and this is I didn’t have time so we’re not doing ssep tonight and then then we’re starting to scrape out the barrel and so oh yeah this is really good actually like I write some code and then I said I’ll be honest this is so easy I don’t have to write tests for this and then a little voice in the back of my head sake just write that test all and then appoint seven bucks that little voice so this is definitely maybe a better program but not technical enough so let’s see what’s next on this unix unix the tools approach and what does the fact that unix supports me constantly a million ways every day but everybody already use a few mix and so it is technical enough but you don’t want to hear it because you already doing so let’s see what was so then finally all right like few days to spare yes we’re going to talk about passes which are some times that are known as associate of arrays Python calls dictionaries pro calls the captain actually pearl originally called an associative array is awesome and like someone got a brainwave about 15 years ago you know what I think’s it sound complicated and I’ll skewer let’s give it a short simple name and don’t sound simple and important and also when we talk about it with each other you want everything with the rotor rooter name is half the ideas actual decision became prevalent in languages i guess it was a slow slow rise that roughly around the late eighties maybe but um but their idea is actually a lot over early sites so the idea is associative base runners list has these days all property lists i have not been able to figure out when these were first introduced to they are ancient i’m guessing around nineteen sixty this language snow all which stands but the name is it’s a joke on Kobol if nothing to do with cobalt it’s just a joke the S is for spring because it’s good at manipulating strings and it laid back in the 1960s you didn’t have languages yet this list oriented languages string oriented languages math oriented languages now that all have to do everything so lucky like that behind the walls feeling really good at straits especially for 1965 we’re going to see more than later off i think that they break through here it had this feature called associative arrays which are in every spec just like a Python dictionaries of pearl apples in 1977 and unlike these similar sorts of things was actually implemented with hash tables without response there was a language called Rex let’s actually met Rex yeah OS I bnos 370 off system called CMS and then with the death of mainframes it had a new life as the scripting language for the amiga personal computer but anyway they were sort of sort of like this but a little bit broken t was the person who did this is ty material and then for a while a Python come on the

scene in the late 80s Early 90s and that’s when the idea really reaches its full flowering so the basic idea what’s an Associated so what’s it associated data structure about an array of course you have a bunch of data and you want to access it using the numeric index so we’ve got some index say 23 and we store Margaret Hamilton in this array a under the index 23 and then later on we asked what’s up with 23 of this Iran Princess Margaret an associative day destroy those hash is just the same except the index doesn’t have to be a number to be an arbitrary string so here we’ve got a string which is Hamilton we’re going to use as the key and we’ll say okay set associate Margaret with the stray s which is Hamilton and then later on we asked what’s associated with Hamilton oh it’s Martin that’s an associate of data structure often called a hash ask four questions but I say something that is utterly confusing and like giant purple question marks are shooting out of your head just like do stop making that’s it alright so the king is almost always some kind of screen some languages managed to make these associative data structures that the index doesn’t have to be a strength being like an arbitrary object or a pointer or function anything and the way that works is that behind the scenes they are taking your non string and somehow turning it into a strange day they use that strength right and that’s not actually that hard to do because everything is stored in the computer is a fast rate in one way or another you have a look at the bits of something though they look to strength that’s the wii computer memory works still so from now on we’re just going to say ok it’s all strings because that’s really what’s going on basic string operation i expect most people are familiar with this or going to sift through it as quickly as possible you can store a pair of key value pair in half you say ok I’m on the store through this hatch here’s the key here’s the value of when I’m associated with it and looks for that key and a half not in the in the in the structure and it refines it replaces the value that’s already scored with the new value ok doesn’t find it it adds a new key value Association the syntax of course is different in every one then once you’ve sorted you can fetch it back so they want to look at this hash for the thing associated with this key and it finds it too gives you the corresponding value that you sort earlier and if not get some kind of null or failure again details very language here’s where it starts to die version raised there’s a contains so is the past contain an association for key this doesn’t even make sense for a raise if you like asking oh is this array of 57 out which doesn’t have an element 33 click oh yeah always because first 10 to 50 6 30 degrees in there you have to ask so the contains is going to return a true or false value true if the key is in their associated with something false then again good there’s a typically some kind of function like keys it gives you back to list all the keys again not something you would never need to do with an iraqi because don’t needle this to the numbers from 0 to 56 who’s counting but here keys might be in there might not be we didn’t put Smith and as he is not there and then sometimes languages extend this for the first 10 an excuse I’m going to turn this if we iterator so it doesn’t return like if you have just billion keys in it doesn’t return a list a million things and use up all your memory and the system stuff is like the city blows up and everybody has to go live in huts so instead you use first key index can iterate through the keys one at a time look I thought it looks like this the thing is called the dictionary you square brackets with the key and square brackets at the store a value is now associated with this key when you fetch it back well I want to get this tested and Paroles like this you just layer that half up front it’s called hash curly braces variables across or with all of science soares the same fetch is the same grouping full and reception fix this illustration but so she’s saying okay me pearl ash too much punctuation first refused this picture on a slide about your computer science discussing

Rice’s theorem which says that there’s no property of computer program that you can possibly calculate just from looking in the storm source unless of the trivial property that’s always true overall its claws want to know like is this function for a village square roots do that I know this function all about getting input do that I know does this function commit it out of bounds away I excessive too bad well there she is iran says that patches are hot if you grew up without if you grow up without if you cannot imagine how terrible things are before before we have like totally change everything not by sorry speechless about here alright Alfred North Whitehead said that civilization advances by extending the number of operations that we can perform without name and that is totally what happens do they take stuff it’s complicated and multi step and tributing implementation and they may be trivial and so here is just one of a billion examples there’s a piece of blog software and it’s got a callback called handle article let’s put it together a web page with 17 articles this gets called 17 times with the paths to the files that contain each article and the category that each article is in weather is math or science or kids or whatever and in the actual text of the article itself so it can be assembled into the web page and the whole bunch of stuff here and pull up the web page with templates or whatever and then publish it and then we want to go and we want to add code to this so that it also generates a count of the number of articles in each category so that we can make a cute tag cloud and nobody knows what it means if you were to do this and see or Fortran or some ancient language is a huge fan but we’re going to see later what you have to do to get this done see as possible this is the pain right but if you’ve got associative data structures hashes or even one of the three persons of half ism all you have to do is this your allocate cash will call a category count it’s going to associate categories with counts and then each time you handle an article say I look up the cap for this category and incremented that’s all you have to do well actually I think this is fun not I did this actually I take it back it’s not about it works great but there is a feature that it requires that I didn’t describe which is oops it turns out counts to come out of this wrong because I didn’t tell you yet because handle article might be the blog software might be manufacturing 12 different pages and some of those pages have the same article on them and those articles it counted twice once for each a home Thursday appear and so the category counts are all pretty okay with this so how do we fix that book that’s a head-scratcher oh I know we’ll use another ash he’s not an scene and will say oh if we haven’t seen you are through this unique packet yet then ferment the category out and make a note that we now have seen it and next time we see it we don’t have to count it again Oh alright so how do we do this before half is became common in language that’s probably the easiest thing you can do is a length of this because limit lists or maybe the simplest non-trivial they pronounced restless really simple so here’s one of you will have it goes here each of which has a state for a keen a stage for the associated value and it stays for a pointer to the next node and an advocate implement business a pseudo code but it would be really easy to translate see here’s our implementation of contains which returns a boolean true our best our parking case study a little bit better it’s gonna rest of this time you see this will be a pointer today headnote as a list and here looking for and we’re going to move as long as that pointer still actually refers to a node and until we find the key were looking for in the current go and if if both of those are true we’re still looking to the node and the key isn’t the right one okay we’ll say all right forget the current node think about the next note instead to the next node and that’s going to keep happening until one of two things happens neither a soft list becomes empty becomes null because we’ve got to the end of the list and we didn’t find a key or we did find that node with

the right key and in that case well if I thought list is no return all indicating a failure that he is not in this structure and if we didn’t find it that ass off list is pointing at that notice we return sure that note and hey here’s oh you asked for key Hamilton here’s your note have a good market and once we’ve got that yeah the store really so yeah says okay well ask it but the Association list contains the key we looking for and if it does we will return the value i dunno the associated wa if not we’ll return it all at store serves a little trick you have to pass something by reference because we’re actually going to have to modify the association list here given the event that we need to add a new key value pair construction so this is by reference giving kiya value again it looks to see okay is the key in there and if it is then great take that note that we just got which already has a value stored and replace with the new value and if not then we allocate a new node we set the new nodes next pointer to point to the head of the old list so we link it onto the front of the list and we say oh and the list is now starts at this node not loaded with starting at before and that’s so simple that you can even do it at sea and it’s not a lot of things you can say that all right let’s see more of this oh yeah you can even do it in Port Rannoch you’re really really medicated this is roles for granted I get to say that was I actually called a job professional portrait programmer once this one I was expecting somebody in the audience but look for trends anything’s possible yes this one yeah so it’s funny I can’t recognize people’s faces very well I often like people I know like come up here in LA mayor and I don’t know who they are who’s this person walking toward me and then they speak and a sudden oh I recognize the voice there’s been here is a weird place to be not actually trouble something much better great all right so this had all of associate a soft lists or sometimes p lists for paper property that we’re very much like this since I again I was not able to find out since the exactly went but is real old they start to them a little differently instead of a single note carrying a key and a value each node carries key or a value and so when you find the key are looking for and then you just turn the next note and that’s the corresponding value this is really easy to do it lists listen the syntax is kind of funny well get this shows you how old it was that it like was a list of function that’s not correct property list Association got that it’s a call to get you get this properties are associated with symbols in list every symbol has a property listed them for scoring uses funny set F thing which is actually one of the really features of this which is that you can give it anything it gets anything and then you just plug that it to edit set up and that means Oh figure out where that thing would have been gotten from instead of getting it change it to that which is this is awesome probably give a diesel clock just about my desk will not today all right so a big drawback of association list is they run in all that time you have failed look up you have to search the entire list all in elements one at a time even a successful search takes around and a half time because you want to advertise your capitalist and then so to insert and items into an association list fix quadratic time which is unacceptable as you’ve got very few items so how to fix this and what answer is oval just use trees we can you know insert the stuff in sorted order that way it takes a login time to do a fetch or a sore find the thing you’re looking for unless you hit the worst-case behavior for the tree and like say for example you’re inserting a million items into the tree and they are arriving in alphabetical or reverse alphabetical order and your tree then doesn’t come out wish you’d leave either way it’s supposed to it comes out as a long thing is a list claiming to be a tree what’s so I’m trying to sneak into the bar underage you’re a let’s get out your kid and unfortunately the worst-case behavior here is very common you look difficult to get all my tires that happened with this sort of already and then bang alright so you show a programmer our problem and they are so happy because program is love to solve problems and I can solve that problem and then they solve it and like yeah but now you’ve got another fall I can solve that one too and then go on solving problems all day 57 of which are the ones that they need solving the first hit and so now we have a very brief

advil excursion into what happens when you try to solve this worst-case problem of list of trees top two three trees which are a way of getting the elements get fit once I gets too heavy elements kind of percolate up and try to balance out of it cool hope you’re good shaker and then there’s red black trees which is where you do 23 trees but three is to hide account you’re trying to pretend it as a 233 even though it’s only a two three and then we’ve got what is my absolute least favorite data structure in the entire world the AVL tree look at what is this good for well it’s good for formatting undergraduates we’re taking data structures design that is totally worthless even uses type known people to use this and it was always like okay why didn’t use a hash and they’re like hash tables a better average time you ate dirt they have same worst tasting at your house can go terribly long and end up with 0 event seachem a short time but the worst-case behavior is extremely rare and unlike a tree it only occurs extremely random linked you can’t unless you know like all the details that I have two ahead of time you cannot construct any particular into it that will cause the have to have worst-case behavior you just have to get the lottery of bad luck and you know it doesn’t happen that often because you don’t get the lottery that whereas of the tree sorted in portable breaking and the algorithms are really simple unlike say well nevermind so here’s the math right I’ll go like this right okay this is only half of it the other half is just as bad but on the other side alright so let’s see where are we all right so before we get into how to have actually works I’m going to describe this thing from snowball because I promised and because it’s so awesome snowball is a really bizarre language so 1962 to nineteen sixty seven inch and it was a very forward-looking language at the time there are parts of soul okay but the thing is it’s really like it was totally language of the future but there’s a future that we’re not in and so you go back now this is amazing and totally different from everything I’ve seen before it does in some ways it’s hopelessly 1965 like it only like control structures it doesn’t have it has go to and they just liked it Oh actually that’s not true it has really good subroutine has recursive subroutines which believe it or not you are unusual in 1965 and we’re like okay so snowball which I reiterate does not have anything to do with co all except is it go it has these associative structures called cables they are not based on actions but the snowball people made a decision not to use a fuse there because here’s a weirdest thing about snowball it’s really good for manipulating strings each string in a snowball program is unique if you construct the straight habitat in one place and the somewhere else you construct the identical string Hamilton is simply the other place they are represented in the program as pointers to the exact same object representing the string handling and how to manage that it has an enormous hash table with an entry for every single string that as actually exists in the program at any point and associated with that is a structure actually a variable structure because you can have a variable with that name so if you have a variable called tea or a very local word those are strings also and there’s an injury in this giant ash table with word pointing to the variable that contains the value of the word berry and tea containing a pointer to the variable structure that contains this table and Hamilton is a string and there’s entry there and it points to a structure totally unique structure that contains nothing because you don’t have a very long Hamilton but you could and you might and so one of the things you can do in snowball it was unusual to say okay I want to sign to a variable and the name of the variable is in variable X and it was really easy to publish your pointers so here you’ve got Hamilton and no this is when you construct this Hamilton put it into word word ends up because a pointer to the same structure that is used for this work and this is how you print something out and Somali was signed into the special variable so these tables things these are associated structures and this is the key and that’s the value and so when we insert Margaret Hamilton is this table no sir barton hamilton this table it looks at an enormous hash table which is not depicted here and finds the pointer to the unique Hamilton structure that’s this and it insults that pointer in here into your table the table was like an Arabian two columns keys and values and then value was martyred so it finds or creates a unique Margaret object and the

souls point your market now we’re going to go a t-shirt Hamilton fish he was governor of New York and so how many fines or creates a fish structure and installs the pointer to that then it finds Hamilton and so we’ve already got a Hamilton object and in the salsa pointer to happen so what if you became one of the strings can you take strings and COBOL but it will manufacture a new you hated straight object and fold that in half and unless they look Margaret Mead find some of this it is going to create need and soul pointer and there’s going to find Margaret in the enormous hash table of every string in the program and it’s fun to find this one and it’s going to install point you’re doing okay why is this good it’s weird but why is it good well you want to search the table and find out if it has the key me how does it do that cuz linear search on the left-hand column but wait that’s go bad but it’s a baby ask oh and because all that only has to do a single pointer comparison so this it knows doing it for me and it has a point you’re gonna sting is that point or not no hot yes so it’s like one instruction so it is lightning fast although it doesn’t scale but that’s okay because computers 9060 didn’t have much merriment so it’s really clever it was a really good engineering trade all right how much time do I five all right so here’s how half the story I promise this I think we’re going to run over by North i set so everybody’s cool stay in your Khalifa alright so here we’ve got something to share your half subjective an associate martyr Hamilton here’s how the app works you take the straight Hamilton and you mad it to an array index you have an array it’s got some number of array elements we’re going to say 16 so you need to map this somehow to a number between 0 and 15 and i will describe later how this works om it’s not here all right so then when we go to the nine then you take the key and the value and you store them into slot number nine of your 15 to 16 LM a director who and if you’re clever okay obviously listening later but if you got Hales and Margaret and hey that’s constant time because this computation is constant time and then looking on the element in an array or setting it is constant time right it’s a single application in addition and then a right so it’s constant time well well actually sir I auto fix it later I’m a lot of do that this to myself okay so three obvious problems you’re possibly some less on your phone first message in a miracle occurs adding that a hash key a string to an array index stuff number one number two all right well you got only 16 or a elements and I think there’s more than 16 strings so what’s going to happen when you have two strings is the same array index as you inevitably well right you cook Margaret Hamilton in there and then every time you insert something new into the half turn on one in 16 chance of hitting Margaret Hamilton and okay what do you do and and finally what happens when the array fills up so three interesting questions there are many answers to these none of them very difficult and they’re so let’s say to David Alpert on a whole amount of talking about hashing all right that’s great because I would really love to know how hash functions work it’s like bleep it’s like mysterious sorcery I don’t know how this could possibly work people do all this weird stuff like fun i’m just going to review the big reveal today was now I realized it is really don’t want to get it details then the last night I can actually tell you the whole story on one slide so here it is so suppose the array has said slots and then you’re trying to figure out where he can should go for you you take a pseudo-random number generator and you see it with k right because it’s strange so it can be turned into a number you like to be the bites into the random number generator one at a time or something like for it on something like that right and have you seen at the random number generator you then extract a random number from and let’s call it R and then hey Rin which reduces it uniformly to some number between 0 and minus 1 suit over terrain xsi and then that’s it that is the big secret and the theorem is like with mathematical theorem like so V we really know yeah then don’t know

anything ahead of time ago the keys are going to be like then this process is random as if the random number generator sufficiently random it was actually random this gives you the best possible performance and all weird tinkering and sneaking around and dark sorcery and stuff that people do with like tricking the hash function and like part of it is just so good because they think they know what the keys earnest be and they’re trying to optimize for that and whether they directly knows what the keys are or not they spend a lot I ever so but this is this is the idea no one line the answer is you pick a repeatable of random index and you put it there the study of what happens when you put stuck at random into a bunch of numbered bins and like because you do not expect it to put 16 things into 16 bins and ran up that they’re all going to go in one each right on average there one each but some of the bins get missed some of the bins get lucky and get like a few in them right so this is like old mathematical set of mathematical problems called balls and dance problem that it says okay well if I put 16 things into 16 pins how many what’s the maximum load it’s called how many how many things going to be in the most old man how many things are going to be empty on average there’s a lot of work done on this it’s really interesting and just a bit of journey here when you’re trying to put two things at the same fan of key value pairs in the same array slot it’s called a collision the erase lots of coal buckets because we’re putting balls in a vice let’s see probably 30 okay so i think i got like six slides left box one part so here’s here’s how pearl handles collisions here’s the array buckets and pearl those who put a single key value pair into the bucket it has linked lists of buttons so you sort Hamilton Margaret in here and we saw how long here and then rosalind Franklin collides with how long has a actually this means that Franklin was in first because then we stick how long at the beginning beginning you don’t want to go to the end of the list that’s a waste of time beginning also that would be more likely to look up keys that you put in recently so the first so every day trips out right three four five here we got another collision there’s two here there’s a bunch of extras at the end here in fifteen still empty the drawback of course is the worst-case behavior is now old man again but to get that do you have to have nearly all the keys in one bucket and since you’re selecting the locations for the keys and random selecting the buckets at random this is extremely unlikely and unlike the tree there is no particular set of obvious patterns for the input that could possibly produce this because you’re selecting them and random so the typical and by far the most common behavior is that these things are all relatively short of length about login and so the fetch and store time is low of log in and so this might well actually for this all right what comes next Oh what happens when you fill up the array we’re going to continue with pearl for a while Python so when you fill up the array at some point parole sees that the lists are getting along and it decides okay we need to do something about this it’s actually quite simple we had 16 slots it allocates a new array or twice as many slots 32 slots and then it basically rebuilt the entire table goes through each one of these keys okay Hamilton I’m going to rehash Hamilton what’s the hash value of a hash value was month month 16-0 about 32 it’s also seroquel put her here all right what happens to how long well is hash value Mon 16 was 21 32 it’s also too so we’ll put it here what happens to rosalind Franklin her half valium on 16 was to the mod 32 it’s 18 so we put her in a different bucket and then like the other ones down here right big bird stays in the same place Nick climbing goes into 18 so right each of these gets split between two buckets and therefore decreases five dot hat and something to lists are all half as long these guys ended up letter Euler in quincy jones ended up in slide 30 both and so earlier I had this picture i said explain later there’s a little number hanging down from Argo Hamilton that’s our house value don’t point your recomputing that for 25 7 or something no point recomputing that score that along with this and that way when you rebuild the hash you don’t have to do the random number generator thing again you just take 140 age 57 mom 32 which is a single instruction it’s just a hand operation and then you know exactly where she wants going to go when you rebuild the table rebuilding tables expensive you have to go over every

single key and redistribute it but here it only happens the table doubles in size each time so you only have to do it login times and if you add up how many reality patients like this crap to do well the last one has to reallocate all of them say n and the one before that well call me have this date so you’d be allocated when they’re only half as many so it’s in over two and the one before that was only an open floor and the only before that disintegrate and that adds up to 2 n so it takes linear time and overall to do all of these reallocations and the amount of time perky is constant constant amount of time per node to do all this reorganizing so it’s cheap python handles collisions in a completely different way it has an array and the erase slot can contain a key and a value and say Hamilton ashes 20 so putting Hamilton and Margaret and then we put in the one hashes 22 so it is Bob long how long and then rosalind Franklin also hashes 22 but that’s lots full so it just goes and it puts her in the next empty slot and then somebody else ends up in slot four and five six seven eight and the nine is mpm we go put in later whether he goes in slot 14 in quickly Jones with some slide 15 because you wanted to go in 14 but that’s also put Jones in the next step you slide and then when you’re going to look something up say what look it up rosalind Franklin okay hash Franklin and you find that the index is 2 or you go there and you don’t compare the strings at this point right we have the hash values scored so we can just compare that with a single integer prepare and see if this person is hash value before the lawn is equal to 1 we’re expecting for Franklin it’s not so we go to the next one pop there she is Oracle’s me for somebody who’s not in there like maybe something after 28 check me that’s not it nine is empty then we give up we know it’s not there really do you give up when you get to an empty slot if we were trying to for somebody who has its 414 but it’s not we check this we check this we check this and then hope is everything we give up so the worst-case look up time here is also oh man because if this table got really full you could like end up and a really long run of keys that are in there and and have to look at all of them before you find it in the empty slot but if the hash table is not too full that does not happen if the hash table is only half full then on average every second slot is empty and if you’re looking at spots looking for an empty one you will only see two full ones before the next empty one so a failed search takes three and a successful surgeon takes at most two that’s it the table is half full so it’s important not to let the table fill up too much and when the table gets too full Python does why it doubles the number of buckets and rebuilt it just like robots deletion and pearls really simple you find the a linked list that contains the thing you want and you delete easy the turtles special case was listed at me or something to the bus python so funny so you want to believe how long here and you can’t just leave this empty because if you did when you went to look up captain you look here first it would be empty and you sing up so instead I found a place to with a thing called a tombstone it says this thing’s been deleted but it’s time then when you come looking for Franklin you come in here okay well this is a tombstone it doesn’t count look at the next slot just like you would and up there she is and then I’m sorry yes is there any advantage of using questions for the end when you double the size of North that’s the role tax that stuff follow tax work alright when you when you rehab if you play double the size the table you’re redoing all the stuff you can skip the tombstones but they’re gone and used to leave them alright think that’s the end of my talk thank you very very much we will have a two-minute break all of my questions oh yeah I makula and I’m wondering what will Python not let me change the change dictionary while generating group oh that’s a really interesting question um I mean it kind of pipeline to the iteration hold on let me think about this a second so yeah so this is the strange issue with thumb so you’re iterating over the key is one of their time right now way that’s going to be implemented is there’s going to be some kind of point or index or something that’s keeping track of where you are now python I imagine it’s an index that just starts with sloppy relevant like books its way through so suppose you can

search Dukey ok and now your hash table is full and it it almost none republics of rearranges everything and then you can think of your generation from key number 57 or whatever well you’re going to skip some because they got hash out output in front of you and you’re going to hit some twice because they got happy to inform and so you’re nice you’re going to lose to guarantee the iteration katy TX exactly once question what you were just like trying to pop something out of it couldn’t it just replace it with two tombstones but we’re rebuilding the cube no oh I’m sorry about I take it back it’s not true if you add stuff to the full half you are not you get the correct results you are allowed to delete and I believe I’ve imagined Python work also you could good leave while you’re gonna rain there no does it but died while I’m doing it her that’s that’s so big on Rossum so pearl actually having a prohibition in the manual that said you may not modify the hash in any way water it rang and I was getting cash code many many years ago and i discovered that deleting is safe and not only is it safe but it’s safe because when Larry wall wrote that code back in 1994 he poked extra pains to get that case right so that you could safely delete from a hassle because that’s what really useful thing to do right over all the keys in the hash and delete all the ones that magic pattern right so lateral wall actually got that right and then some some knucklehead came later and rode in the manual you can’t do nurse so now were no you’re wrong he were see we scrub the manual and now it says oh you can delete I think that Python could work because as you pointed out while ask you was put in a tombstone and then later if you that the tombstone while you’re at a radius okay was going to skip it anyway right but you know that’s that’s beautiful that’s really when he’s mine this language you know you but was philosophy and so you can’t see so that was too long who has another question is not so on okay we got a key or when you’re dr. house the whole source every spring and a hash tables they’re gonna think about like are you still around Python like clearly everything’s eventually stored in a dictionary huh okay that makes sense except that this is at one time so when you construct a string at one time it gets stored at this hash I don’t even know if so call is compiled it may be purely interpret it did I miss the point of your question I mean like I fine does that too but it goes even further late huh if you make a very long I thought it will be in the local namespace which is just a dictionary yeah if you have a huge if you say a equals the string Hamilton B equals the story Hamilton they’re not going to end up pointing to the identical straight Hamilton are they they’re going to point it to different objects that both happen to companion snowball they won’t end up pointing to the same one and only Hamilton because just filling around the prediction yes means yes keep coming inside just kidding now is there a clear Brittany was a mentality behind okay to send I don’t know oh well we can’t meet yet because right over the hash and compact up please doctor I don’t know the developers human I’m not sorry you can over it yeah yeah yeah I’m certain right I should have mentioned that after so we believe in how long right and then we had gone it inserted somebody needs to go into slot too they were overridden that food stuff yes thank you it’s ki hai see there is again I’ve never know the boys yes sorry why didn’t you allocate and we patch but in SF actual medical you can end up like tree of hashes search down the tree of happiness and allocation goes wrong then the tree of hash is really complicated big structures instead of little teen list

notes since that’s all right yes I 16 buckets actually like is that the number that it always starts with I don’t know what Python does pearl always start today unless there’s actually there’s a way to say i want a half and i want you to prepare it with the smoke this many buckets and rounded up to the nearest power to you can do that if you if you know ahead of time that you’re going to going to be you no could be inserting a million keys you may as well pre-allocated with you know to the twentieth buckets it’s not much often done because the reallocation really just must cost too much oh yeah was there someone other than sumana who have a close question I have one yes so I believe that slide 31 you were talking about operations on the table being log n sorry to say that X slide 31 yeah could you yeah because like three yes yes so you said that operations on this table with people again searches of the linked lists are roughly login and what was your destination I didn’t give any the justification is the mathematics of the balls in bins problem says that if you throw n items at random it invince then almost all the time the most heavily loaded been one of the most balls in it has around log and balls and I cannot demonstrate that right now but I can probably give up a reference for you that’s that’s the answer is always the chance for the balls to cluster way more than that is infinitesimal so I was curious about whether other other methods that are like cowgirl Perl Python do have good for example i’m assuming javascript objects are hashes i would think so yeah Chris area do they use a similar method to nothing about the internals of the frame okay but so are other different methods other than these two of you show that are clearly forward for tech miss of creating actually I think there’s at least one other thing and Jen no but well no yes right because this complicated question you put characters in one other time it doesn’t I think it does is if not identical to very very similar to okay we literally say like okay well we’ll shift and then we’ll like I saw this they in and it’s just exactly the way around so let’s a go yes yeah No as far as I nothing to stop you from actually none the IRA wasn’t purely for and you put stuff and a half in it with you could call the method I name and draw a picture of the buckets and where all the keys and about yeah it’s not difficult thing to do but I had to be able to that girl still international yet sometimes can be usually really really know a lot about these are going to look like picking your own hands useful if you’re sure that bill did one isn’t going to work for your data because you tried it but use yeah yeah I can see that let me have your done to a java at all I was alone I was a professional job program for three years and it was fun I liked it okay so way to close this blue shirt so how many records or but you really are certain like the customer is going to get angry by so the only have one thing here to

length endless it’s probably faster diesel inkless yeah so here’s the thing about that data stream right the performance on small data on my small amounts of data is not important because no matter how we do it is going to be fast the only thing that matters is how does this from from a large amounts of data because they’re small amounts of data it doesn’t matter if you’re using a leica n square and login or linear time algorithm as they’re all going to be like okay well this one took a mil microsecond in this one to two microseconds oh my gosh that was twice twice as fast okay because we leave a real world where time really lasses not the relative time is gathering small so you know it’s small it doesn’t matter what use yeah the plist would be simpler I guess I don’t go to try it this is going to marry from computer from one half behavior some point soon be like some situation where think thirty-seven seconds of ossification in this month off right but we don’t have a boss time I’m trying to make like this is a serious game that no no it might back but I can’t answer that question because the reasons for which it will matter are not specified and I don’t know what they are and that’s that’s my real answer I’m sorry so long as you have already spinning graphic it doesn’t what so long as there’s a pretty spinny graphics Larry wall said that the goal of pearl is Evan a program that you could it is successful if you can finish it before your boss virus you all right there was some question somewhere yes um you know off hand with Perl Python like random number generators that are using the hash function or design with it’s hard to engineer question sir yeah I cannot remember what it was but I can easily find out at least for parole because i remember it was like a big thing about it we’re like this by the right for my sake doofus because that’s that’s not what I’m trying to express but ok so there’s like suddenly this thing blew up oh my gosh you can attack the Pearl hash function in a web user can start to keep some of your denial of service and like in am I pointed this out back or it’s really not a big deal and like nobody’s ever actually done this but they taste around Kroll generates a hopefully secure random number beginning of its run and it uses that to generate the hash function I think it’s mixed into the hash function so you really can’t predict from 11 to the next where things are going to go but that makes it literally impossible to to attack the hash must have been early two thousands maybe that’s what I say when I have a misunderstanding yes tickle school okay what is tickle who hit into a command language it’s tickle okay but okay that mice are such as pick up that languages some interesting things cable car your expressions at least interesting the compassionate I don’t think so because my okay so I went my knowledge of tickles 20 years out of date with that DX that enormous cabinet my experience of tip off is they feel underpowered and didn’t have any serious data structures at all much less half they didn’t even have lists the way they told you how to do lists is oh well just concatenate them into a big straight with spaces in between which of course is a terrible dancer and that I would like meet people like now that works for them no that’s what I’d like to increment a number it was scored as a string and it would have to confirm eternally to a number do the integer arithmetic on it and it would screen eyes it again and they thought this was ok because it’s just a scripting language which is like a mistake that people have made so many times there’s no excuse for it anymore that all scripting languages become programming languages so it doesn’t have to work right people only be ready little stuff in it and then right unless the language is so crappy that it collapses under its own weight when you try to do that and with tickle what is your tried hard so I hope I’m not likely to feel bad about tickle play I hate it alright so let’s it moving on I hope we have two minutes left this again but Magnus somebody we’re going to get to

salon yes semana all right I was confused by I do the number that which is like this sort of six ish digit number of course always five so for instance yes um where’s that scored use in a while I throw that away which implies it was for something uh yeah so this is this is something I slipped into the slides after the test run this afternoon that I really organized as well as I wanted it to but the idea is instead of just storing the key and the value you also we’re going to store the hash that you compute it yeah right because I receiveth inexpensive computation too so here’s why this it really doesn’t really easy use case here suppose we catch up how long and we discover their how hashes to something that is also equal to when you modify six teens let’s do it from that you’re going to come here and now we need to find it how long is it is linked list and well if you didn’t have that you have to do a string compare between hamilton and why and then you have to do a straight compare on the next one and it’s much easier instead of going straight compared to take my mind well if you kept the hash value you know wants number is safe let’s say so let’s say it’s 25 okay well this is 25 and you can calculate that in one instruction where is it story in another cell parallel to be to yourselves you have to know you have a struct and thus truck has a pointer to the key and a pointer the value and an integer where you stick the hash is that I mean it’s we’re just with boxes here so just add another five that’s not if your memory works are you being sarcastic I’m not being sarcastic I’m honestly puzzled by the question I’m sorry okay I think this might be a I have to know see it understands though this so okay right so you allocate a structure and I structure have to have enough space in it to store two strings which means two pointers because that’s what strings are it’s a little bit of their pointers to a place where they did it really is right that’s eight money right okay well bouquet allocate all lines okay but the two pointers in the first date bikes and put this number in another four bytes all right exactly it’s better okay thank you alright so who’s all right I don’t have a question about you so this is this number then I know it’s done in Pearl and I will be shocked if it worked on quite on such an obvious optimization and whatever we always use not form a choice stuff they’re not no they’re not so imagine you are you are a few cup 16 balls and you draw 6 you have a half with 16 slips number from 1 to 16 then you draw a slipper random and you put the ball in that slime and you draw 16 members what are the chances that you are not going to get the same number twice as far as my it will expand right like you said you’re going to go from 16 to 32 curiously is in place that price encounter acta that’s money if I viously perhaps because you kept the number of buckets vastly more than a number of all of painting in the buckets then collisions would be unlikely but that’s a waste of memory you’re trading off every time right you and memories cheap at a time so awesome plan ridiculous anyway in practice whatever the trade-off is made differently and there are invariably quite a lot because excite a lot of collisions without actually slowing down the access the worse I looked it up a second ago Python expands at the two-thirds full point 30 and under 50,000 elements it expands by a factor of four buddies denigrating root system ID dictionary it was on Stack Overflow price goes up thirds full then we’re out of time all right we think one more one more now let’s take one more oh wait you already asked you