Worked Example: (Chapter 15)

(smooth music) – [Charles] Hello everybody Welcome to Python for Everybody We are doing some code walk throughs If you wanna follow along with the code, you can download the source code from Python for Everybody dot, the Python for Everybody website Okay? So, the code we’re playin’ with today is T W friends dot P Y And, this is a step beyond the simple T W spider It is a restartable spider But, we’re gonna data model things a little bit differently We’re gonna have two tables and we’re gonna have a, a many to many relationship, expect that it’s sort of a many to many relationship between the same table, which is okay Friends is a, Twitter friends are a directional relationship And so, so we start out here in T W friends dot P Y Remember that the file hidden dot P Y, I’ll show it to you but I’m not gonna open it ’cause I’ve got my keys and secrets in it So, this hidden dot P Y file, you gotta edit that and you gotta go to apps dot Twitter dot com and get your keys and put them in there, otherwise these things won’t work But, if you have Twitter and you set your API keys up and you put them in hidden dot P Y, then all these things will work It’s kinda fun, actually, and impressive Not hard to do, actually So, (clears throat) the Twitter URL, that’s my library that reads hidden dot P Y and augments the URL and does all the Oh WOT stuff, jay son and S S L, because Twitter doesn’t, I mean because Python doesn’t accept any certificates, even if they’re good certificates So, we kinda crush that Here’s our friend’s list that we’re gonna hit We’re gonna make a database, friends dot S Q lite Now, here we’re doing create table if not exists So, what this really is saying is, I want this to be a restartable process, and I don’t wanna lose the data If we’re starting out, we do not have S Q lite, any S Q lite files, and so this is going to create the database and create these tables, but the second time we run it, we’re not gonna recreate the tables We’re not gonna, we’re gonna be able to restart this because we’re gonna run out of, we’re gonna run out of rate limit before we finish this, but, so, we just have to wait however long the rate, the time it takes to reset We’ll watch the rate limit go down And, so we’re gonna have a people table and we’re gonna have a primary key and the name The name is gonna be unique And, whether or not we’ve retrieved it And, that’s kinda from a previous one But then there’s the who follows who The from ID to to ID So, this is a direction and we’re going to put a uniqueness constraint in just like we do in many to many that basically says the combination of from ID and to ID has got to be unique We don’t allow ourselves to put duplicates of the combination So, from ID can be one in many records and to ID can be one in many records but one, one is only allowed once And, this is the crud we have to do to convince Python to accept the Twitter certificate And, so this is similar to some of the other stuff that we’ve done We’re going to enter a Twitter account or quit And, if we hit enter by itself, then we will actually go and retrieve a record that was not yet retrieved And, now we’re actually pulling out two values: ID and name And, so we will, we will grab, fetch one is gonna give us a two tuple, basically And, we’re gonna store that in ID and account Of course, that’s like, this is, this is coming back with a two tuple, first of which is the ID from the database Limit one means we’re only gonna get one of these, or zero of these If there are zero of these, that means there are no un-retrieved Twitter accounts Retrieved equals zero, well you’ll see in a second that the, all the new accounts we put in are the ones for which we haven’t retrieved And, again given that our rate limit, we wanna know which ones we’ve retrieved Okay? And, and so what we’re gonna do next is we’re gonna check to see if the person that we just checked, which means the length of the account is greater (mumbles), that we just were entered We’re gonna check to see if they’re already there Okay? And, we’re gonna select ID from people where name equals So, that’s the one we just entered And, we’re gonna fetch one and grab the first thing, ’cause we only got one thing in the select statement, here And, if this person that we just asked to see is not in the table, that means this is going to fail, we’re going to do an insert or ignore This or ignore is kinda redundant because we just checked to see if it was there But, we’ll put that in just to be safe And, we’re gonna put the name in for, as the new, new account that we’re looking at And, we’re indicating that retrieved is zero So, that we will, we will know that we haven’t retrieved it yet You’ll see we’ll update that in a second We commit it so that later selects will see this

So, the, so you gotta do the commit This later select wouldn’t see the one we just inserted And, we’re gonna ask how many rows were effected And, if it’s not equal to one then we’re gonna complain about we inserted it And, we are going to do this thing We’re gonna ask, “Hey, member there was an ID up there?” Do do do Right here: ID integer primary key And, we did not insert this here, but we wanna know what that ID is And, every time I was showing you that in lectures, I was saying it’s really easy in Python to do this And, that’s what we’re saying, is this cursor did the insert, but one of the things happens is, after the insert we’re gonna grab the last row ID, which is the primary key that was assigned by S Q L Okay? And, so that means that one way or another coming through this code, here, in line 45, one way or another we’re either gonna know the ID of the user that was there before or we just inserted one and so we’re gonna know the primary key of the current user And, you’ll see why we need that So, ID is the primary key of the current user that we entered right here Okay? And, now what we’re gonna do is do the Twitter URL augment with the oh off and all the keys and the secrets in hidden dot P Y And, instead we’re gonna go through, let’s count 1000, let’s go count, what the heck, let’s go 200, up to 200 friends Save, no let’s do 100 Let’s keep it that way And, then we’re gonna retrieve it And, we’re retrieving the account We’re not gonna print the nasty URL out We could Then, we’re gonna open the URL with a connection, and then we’re gonna read that and we’re gonna get the UTFA data from this and then we’re decode that And, we’re gonna have the uni-code data So, the data in string is a internal Python string with all that data representing all the wonderful characters And, of course we’re gonna ask URL open to give us back the headers as a dictionary, using this call And, we can see what the, how many we have left for the remaining Right? What’s the remaining rate limit that we have? Okay? And, so then what we’re gonna do is parse the data with jay son load s If, oh wait, I need a continue in here Continue Okay Save If we are going to parse this data, we’ll print it out Right? So, that means that this, this died, which means it’s not syntactically correct, jay son, basically And, who knows if we’re ever gonna see that But, at least when it blows up, it’ll print this data out We’ll have to catch it, and then it’ll continue Actually, I’ll make this a break ‘Cause if that’s blowin’ up that bad, we should quit Now, we don’t, I don’t yet know what happens when this rate limit says you can’t have it And, so, but I do know that I expect when it’s successful, that there will be a key of users in this outer dictionary that we’re going to get And, if this outer dictionary that we’re, if we, if users is not in the parsed dictionary, then I’m gonna dump out this data so at least I can debug what happens when I’ve got some broken jay son So, the difference between this code, this code is gonna fail when the jay son syntactically bad, meaning a curly brace isn’t right or whatever This code will trigger when I get good jay son, but I don’t have a users key in it Okay? So, then once we’ve retrieved it, we’re pretty happy with it, we’re gonna update for our account that we are retrieving, we’re gonna set this is one of our retrieved accounts Okay? And, then what we’re going to do is write a loop that goes through all the friends of this particular user that we’re asking, and gets their screen name Prints it out And, then we’re going to check to see if this one is already in our people database ’cause this is a spider, we’re grabbing accounts And, and so we’ll do a friend ID And, do a fetch one, grab the subzero thing And, if that works, if this person’s not in there, this fetch one is gonna blow up, which means we’re gonna drop down to the except code But, if it does work, we have friend ID is the, you know, they’re in there and they’re already in our database Right? They just weren’t retrieved Okay? And, so now, if friend ID wasn’t there, we’re gonna do an insert into setting retrieve to zero And, then we’re gonna commit, right? Now, member, row count is how many rows were effected by this last transaction Cur dot row account And, we’re gonna die if that (mumbles) insert doesn’t work This is unlikely, unless somehow we ran out of disk drive or something And, we’re gonna grab the friend ID as the, as they key, the last row that was inserted We’re only gonna insert one row, so it’s basically the primary key

of the row we just inserted So, if you look at this code right here It comes out the bottom one way or another with friend ID successful Right? ‘Cause friend ID is either they’re already in our database or they’re not And, if we insert them, then we have it And, so now this count new and count old is just so I can print out a nice print out Now, we are gonna insert into the friend’s table, which is called the follows table in this case From ID and to ID, those are the, those are the two outward, outward pointing foreign keys And, we have the ID of the account that we are retrieving the friends of And, then this particular friend And, so we’re inserting the connection from this person to that person And, then we commit it We wanna commit these again so that later selects, when the loop goes back up, later selects get all of that data that’s going on Okay? So, we do want to commit from time to time And, then we close the cursor at the very end Okay? So, let’s run this and see what happens Okay So, Python T W friends dot P Y Oh, of course I am a refugee from Python two so I always forget to type Python three Okay, so we’re gonna start If we take a look right now, I’m gonna start another tab over here, and L S minus L star S Q lite Now, that S Q lite file is there, right? And, it’s actually made the tables If you go up here, it ran all this stuff Create the tables, yada yada, and we’re sitting right here at this line As a matter of fact, I think without causing too much trouble, I can open that database and get into this database right here, and there is no data in the follows table and there is no data in the people table It’s completely empty Okay? So, we’re waiting for the first one And, I’ll go with mine, Doctor Chuck So, it’s retrieving the hundred friends and they all were brand new, they were all in the serted, right? And, so now if I hit refresh, we will see that Doctor Chuck is retrieved Who follows, so these are all the people I follow If one follows two So, of we look at here, we see that Doctor Chuck follows Stephanie Teasley Because we grabbed the followers of Doctor Chuck, you know, we’re gonna have a record in all of the follows for all the ones that I did Right? So, these are all the people I followed And, we put them in Okay? So, we can go back, and we can, let’s see, grab somebody Let’s go grab Stephanie Teasley And let’s pull out her friends So, we grabbed a hundred of her folks I got 14 left, that’s my ex-rate limit So, I did Stephanie Teasley, so let’s go back here So, you’ll notice there’s 101, there’s probably gonna be, oh 182, that’s interesting So, we’ve retrieved Doctor Chuck and Stephanie Teasley, and let’s go take a look in the friends table, the follows table Okay So, we have all of the people I follow now all the people Stephanie follows Okay? So, there we go So, let’s go ahead and do somebody else Let’s see, I think we both follow Tim McKay Where’s Tim McKay? Yeah, let’s follow Tim McKay Let’s see what, who Tim follows See if we can get like an overlap Oh, we revisited some Let’s see if we can see this in the follows See people So, we’ve got Doctor Chuck retrieved, and Tim McKay’s somewhere down here It might take us a while before we get any really good overlaps Let’s see, let’s do a database call, let’s see, let’s do a database SQL Select Count Eh Okay, so let’s just run this some more It’s clearly working Now, one thing I can do here is I can hit enter and it will just pick one randomly So, it grabbed live E D U TV And, I can, and let’s see how many I got left We got 12 left And, now I can hit enter again, it picks another one That was the next one Oh, it’s kinda pickin’ ’em in order

Is it pickin’ ’em in order? Let’s go to people Yeah, it’s pickin’ these So, it’s gonna, we can see that it’s gonna just do the first un-retrieved person, who is Nancy Let’s let it retrieve Nancy So, it grabbed Nancy No So, we’re finding some, and this table’s gettin’ really big So, if we look at the people table, we now have 455 people And, we have 467 following records And, so there we go Oops Hit enter, it does another one And, away we go So, you get the idea I can type quit to finish And, just to give you a, a little interesting bit of code to show you how to do selects, I’m gonna do this TW join, now you’ll notice that we’re not talking, oh let’s show you one thing LS one S L friends star SQ lite So, this database has it so I can restart this process and run it again, and the database is still there And, so we just grab, (laughs) swear trek, and so we can keep doing this And, and so this data it keeps extending So this is a restartable, restartable process I can run it and then tell it to grab the next un-retrieved one And, so away we go, right? And, so that’s part of it So, so I can, if I run out of my, I’ve got eight left, oh how many do I have left, really? Let’s keep going How many do I got left? I’ve got five left Okay Wait, oh I guess we’ll just run it out So, I got four left You know what I should do is I should, I can’t change the code Yes, I can’t change the code I can stop the code and I can quit the code So, what I’ma do is I’ma change this code a little bit really quick And, I’ma print the headers of rate limiting at the beginning and at the end So, now I can run it again I changed the code, hopefully I didn’t make a Python error Tell it to go get another one Hannah Devaro And, so I got three left Oops We’ll see what happens when I run outta rate limit Run outta rate limit So, we have one left Hit enter Hit control k Open source dot org So, we have zero left That worked Now, let’s see what happens I don’t know what happens next Oh, we blew up Too many requests Oh we got a H T to be error four 29 So that means that, going from Mark Cuban, that was in line 48, so the right thing to do would be in line 48, we should really put this in a tri, tri except block Tri except block because it gives us an error Print Oh fiddle sticks How do I print the exception message? I always am forgetting Print failed to retrieve Okay So, we’ll put that in Now, if I run it And, then I have to put a break here because that’s not good Break Failed to retrieve Now, I gotta figure out, oh I, see I never know how to print out the error message Yeah So, I have to, I never, see that’s the weird thing about stuff is that I don’t ever remember enough I don’t remember the syntax what I say here to print the error message out So, I’m gonna go to Google, and I’m gonna say print out the exception message in Python Print out the exception message in Python Oh, Python three, hello Okay, so let’s go find it here in the documentation Except Except Is this it? Is this what I say?

I just wanna print out the message Ah, that’s it Except Let’s try this So, this is part of Python programming is like, for me at least, ’cause I’m just not like a genius expert at this stuff (laughs) This is one thing I like about Python, is you can guess stuff and sometimes you guess right So, there we go We got the error, we got the nice little error message and we see error four 29, too many requests So, that cleans that up nicely So we’re, we have run out of requests And on that, it is a good, good time to say thanks for listening and I hope that you found this valuable (smooth music)