Google I/O 2011: Querying Freebase: Get More From MQL

Taylor: Good afternoon My name is Jamie Taylor, and, uh, this afternoon, um, to wrap things up, we’re going to talk a little bit more about, uh, using Freebase and MQL Um, this is, um, a talk which is really sort of trying to move your use of MQL ahead, um, some of advanced sort of techniques And, um, I know that not everybody here is probably, um, completely versed in MQL and as facile as, uh, some people, so I thought it would be useful to actually do a little bit of a level set, um, to make sure that everybody was sort of thinking about things the same way and everybody was familiar with the syntax and some of the capabilities of the query language So we’ll start by talking a little but about Freebase, um, understanding how MQL works, uh, some of the different forms that it takes, and then we’re gonna push ahead and really look quite closely at how links are actually formed and all of the metadata around links, um, in Freebase And that’s really going to be sort of the heart or the power in, um, this talk And given what we’ve learned about, uh, how links are actually represented in Freebase, this really suggests a different way of conceptualizing your programming model, um, when you’re working with Freebase data, and so we’ll talk a little bit about what I call property-driven programming And then finally, to wind up, um, we’ll talk about metaschema, which is a fairly new concept, um, in Freebase Uh, and the idea is that you can actually get more power by rolling things together So with that, um, I should point out that, um, I would really if you have feedback about this talk, uh, to use the short link Um, I’ll have these available– uh, at the end of the talk, as well– um, and hashtags that people are using when they’re in their social networks Um, I should also say that, um, the slides may not be sort of as easily viewable, but, um, at the end, I will give you a URL, uh, that actually takes you to an acre application where all the queries are available and you can actually explore them and edit them yourselves, so you don’t have to worry about taking notes and things like that So, uh, let’s start by talking a little bit about Freebase Uh, so you’re probably aware that Freebase is a giant graph of entities Um, so here we’re looking at the neighborhood around, uh, Jane Austen Um, she was an author who was born in a town in E-England and wrote a couple of books, and she’s influenced a few people, uh, over her lifetime Um, what’s unique about Freebase is that all of these links between these entities are actually labels that we can actually tell you what type of relationship they have, uh, with one another And this graph is fairly large Uh, so right now there are over 22 million topics in Freebase– these are the entities– uh, and over 400 million connections between those 20 million– 22 million, uh, entities Everything in Freebase is available under a Creative Commons attribution license So you’re free to use it in any way that you see fit, uh, as long as you give credit back to where the data came from, which I find very interesting about Freebase, as well And this is really where the–the community contribution part, I think, kicks in in a very interesting way, is that Freebase is a very rich, uh, source of vocabulary, that is for ways of actually talking about entities So here I’ve just, um, created a simple graph where I’ve, um, ranked the instances, um, that use types– types being sort of collections of attributes– and looked at all of these types that had more than ten entity instances And you can see that we have over 6,000 different types that you can draw from Uh, over 1,700 of these are in what we call the commons Um, that’s an area that’s fairly well protected, um, community managed, and curated Um, these are the vocabularies that you can really depend on And then we’ve got, uh, over 4,000 that are in what we call bases Uh, these are areas where individuals or small communities are coming together and actually developing a vocabulary So there’s a very rich source, um, of ways of talking about the entities that’s available to you So that we’re all using the same terminology and so that everybody’s clear, uh, topics are things that, um, uh, are–are– topics are objects that represent things in the real world Um, so for instance, um, we have a topic that represents a person– Jane Austen Um, we have topics that represent places– um, Steventon, where she was born Uh, and we have topics that represent other things in the world, for instance– Darcy, one of her characters, um, other things like planets and iPods and things like that, as well Uh, what’s unique about Freebase

is that all of these, uh, topics have strong identifiers, that is, there is a unique, uh, and dependable way that you can actually reference, uh, these objects by using these identifiers But it’s important to keep in mind that objects, while they have IDs, the IDs are not actually their names Names are actually attributes on these objects So that leads us to properties, the relationships between these objects And you can see that, uh, Jane Aun–Jane Austen influenced Henry James, and Henry James was influenced by Jane Austen Um, these links are bidirectional, but it’s important to understand that the links can have unique names in each direction And finally, this is really a-a very subtle thing when you’re looking at a graph of entities It’s the properties that are actually creating the meaning When you, you know, think about the object that represents Jane Austen, there’s nothing intrinsic about that object that tells you anything about Jane Austen It’s the relationships that that object has with other things in the graph, for instance, the fact that there’s a link called “place of birth,” a property called “place of birth” that leads you to another object which is Steventon And the fact that there is a date of birth that’s– leads you to a literal, which is 1775, and the fact that she actually has a name, which is her label– these are the things that actually bestow Jane Austen-ness onto that object So now that we have sort of a-a quick understanding of sort of what it is that we’re talking about, um, let’s press right on into the language Um, this is MQL, the Metaweb Query Language, where we can ask this graph of entities questions We’re gonna start off, um, with a bunch of really simple queries that I think demonstrate the different forms that, uh, queries can take So we’re just gonna look at the relationship between Jane Austen and her place of birth So probably the simplest query that we could formulate is to say we’re gonna start with this object, Jane Austen, and then we’re going to look at this property, which is, um, ideas, uh, /people/person/place_of_birth, and we’re gonna say “null,” that is we want the system to actually fill in that last part And when we do this and run the query, it comes back with the “label” Steventon So, uh, it replaces the null with just the label of the entity at the other end Now we can be a little bit more sophisticated and we can actually say, “well, we know that the thing at the other end is an object,” and so by using the curly braces, um, we’re actually going to ask the system to blow this object out And what it gives us back is all of the sort of core object properties that are on the other end So for instance, the fact that it has an ID, that it has a name, um, and this object has a whole bunch of these different types It’s, uh, a topic It’s a location It’s a city/town, and there’s a bunch more information that it gives Being a little bit more sophisticated, we can actually ask a wild card, um, and the wild card, um, the star is going to return us the same information that we got in the last query, plus a bunch more information, um, based on the type of object that we should be seeing on the other end of that link So here it knows that, um, place of birth has a location at the other end, and so it’s going to give us things like the containment hierarchy– the fact that it’s contained by the United Kingdom It’s contained by Hampshire Um, if it had, uh, if it were in the United States, it might have a USBG name associated with it So, uh, we can also ask for a very specific property off of this where we just want the ID of the object at the other end So here we just get back /en/steventon And in this case, I’m actually gonna turn the query around I’m gonna say I know that, you know, some people were born in Steventon Um, I’m curious as to who they are And I put this query in square brackets, because I know that I’m gonna– going to possibly get a list of people back And so, in fact, it turns out that Jane and her siblings were actually born there– not terribly surprising Uh, we can also, um, reverse the meaning of the /people/place_of_birth, uh, property So here I’m saying, uh, I’m starting on Steventon, and I know there’s links from other objects that are place of birth Show me what’s there And, uh, we use the bang operator in this case to reverse the meaning of the property And lo and behold, we actually get Jane and her siblings back as well here So that’s great That sort of runs us through the different forms that a MQL query can take, and we’ll be using a lot of those different forms, uh, as we work through the rest of this, uh, talk Um, but what’s really interesting about Freebase is the fact that all of these properties are in a very orderly relationship with one another, and this is what we call schema, and this is sort of how to understand what to expect,

um, as you navigate the graph And so we can start with Jane Austen, and we can say, “How is it that we actually know “that Jane Austen is a person? “And furthermore, what makes us think that she actually has this property ‘place of birth’?” So let’s start with the first question And the first thing to sort of understand is that everything in Freebase is represented as an object And objects then have a set of properties associated with them, and I don’t know that you can necessarily read this, but the idea is that there’s a reference that you can go to and you can see all of the properties on a type object And one of the first things that you’ll probably use, uh, is the type property So that’s great We know that, um, Jane Austen, being an object, she’s going to have a type property associated with her, and we can actually run a query to find out what’s at the other end of that, uh, type link And lo and behold, we discover that there’s another object which has the ID /people/person Ah, so this is interesting This is how it is that we think that Jane Austen is a person But what’s so special about that /people/person object? Well, it’s an object, so it’s going to have a type link, and we can run a query there to find out what it’s connected to, and what we discover is, at the other end, there’s another object called /type/type Okay, so that’s great We have another object there Super. Let’s play this game out a little bit further Let’s run that query again And what’s special about /type/type is that /type/type is actually connected to itself So this sort of completes the circuit, right? We actually now understand how it is that /people/person became a type and how it is then that Jane Austen becomes a person by being connected to that object So that’s cool You know, we have some basic understanding of sort of how the type system comes into being in the graph Um, but that still doesn’t answer this–this latter question, which is, how is it that we think that she actually has a property called “place of birth”? Well, to do that, we probably have to look a little bit closer at this interesting type called type Uh, one of the things to notice is that types have properties, so let’s go back to our graph Um, this is the same triad that we were just exploring Uh, and we know that there’s another type out there called property, so that’s great Um, we know that this object, which is type property is actually connected to /type/type just like person is– cool Um, and we know that this type has properties, and one of those properties is, uh, a thing called expected type So now we actually have a property on property, which is the expected type Awesome You can see how this playing out So now we can actually go and construct a, um, place of birth property We connect it up, because it’s a type property And we’re going to make it a property of /people/person And it actually has an expected type of location And you can see how this is all interconnected So this is great We actually have now an understanding of both how types in the system come into being and also how properties are represented in the system and how they’re connected up to their types What’s really fun about this is that all of this is done in the graph And that means that using our MQL query capabilities that we’ve been building up, we can actually interrogate these structures exactly the same way we’ve been interrogating, uh, Jane Austen In fact, we can do both at the same time How much fun is that? So let’s play on and sort of interrogate the schema a little bit more So for this, uh, object out there which is /people/person, we know that its type is type, um, and we want to know all of the properties that are on this type So, uh, we ask a simple query We get back a list of all of the properties, including, um, place of birth It happens to also have gender and profession and a whole bunch of other things Uh, so for this place of birth, um, we know that it’s a /type/property We know that there’s a property called expected_types Let’s find out what’s at the other– what we should expect to be at the other end of these objects And that tells us that it’s a location. So great So we actually now can interrogate, um, that graph structure that we were looking at before So now we can build up slightly more interesting structures, and this is a structure that we’re actually going to refer to again Um, we’re going to say, for, uh, some type, um, I want all of its properties, and I want to know what the expected type is of those properties So, um, find me all of the types that have properties where the expected type is a location And what we get back is, um, the fact that, uh, the location type has a property called “contains.” The location property also has a property called “containedby.” What’s kind of interesting

is this is actually what we call the phylogeny pattern These are, um, properties that expect their own type, uh, on the other side, and you can sort of create hierarchies and–and circuits that way Um, we also get human language back– the region of the human language as a location Uh, and lo and behold, place_of_birth on /people/person, uh, has an expected type of location as well as a whole bunch of other ones So great Now I told you that, uh, properties, uh, are bidirectional and that they have different names in each direction And for the most part, you can just sort of think of properties as being sort of uniform But it turns out that, under the surface, properties actually have a bit of a directionality associated with them– where were they defined, essentially Um, and so here, I’m asking for, um, the, uh, in the, uh, written_work type, I want to know about the author property And I want to know if it’s a master or if it’s actually just the inverse of, uh, another property And it turns out that, when I run this query, I discover that, in fact, um, there is a master property, which is defined on author, which is the works written So this is that link running in the other direction So the only thing to take away from this is that there is this notion of directionality that’s sort of buried under the covers For the most part, you really don’t have to think about it, but when we’re gonna start playing some of the games that we’re going to be talking about a little bit down the road, um, this is actually going to come in very handy So now that we understand a little bit more about schema, how to interrogate it, um, let’s press on, and actually look into connections Now going back to that really simple query that we started with, um, where we talk about Jane Austen, this time, we’re actually gonna look at the languages that she spoke Uh, I want to find out what, uh, languages are at the other end, and what you discover is that, um, she actually spoke English, so that’s the little representation at the bottom And what you have to really think about when you write a very simple query like this is that this /people/person/languages, this property is really representing this link, and what you’re doing is you’re looking from one object across to another object, but what we don’t see is anything about that link itself So happily, uh, MQL actually provides a directive called “link” that we can insert to actually see into, uh, the connection that’s being made between these two objects It’s not just that we think that these objects are sort of nearby We think that there’s a link, and we want to know more about them Uh, the link directive actually returns, uh, objects of /type/link and if you look at the properties for /type/link, you discover that it has things like creator and timestamp– Who did this? When did they do it?– as well as a whole bunch of other properties, and we’re gonna look at those other properties, actually, in depth here So, um, this is the, uh, client display of the schema for a /type/link, and this is actually one of my favorite, um, type definitions in the whole Freebase system, because I-I think it’s just like the ultimate understatement– “used to access the advanced features of MQL.” Tells you nothing! Um, and yet, all of this power is hidden away in these different properties Um, so we’re going to explore this, actually, a-a-a bit in depth So these are the properties of /type/link And they actually sort of, um, form different bundles, which I think of as sort of fulfilling different needs that you might have, depending upon sort of what activity you’re engaged in So this first set–timestamp, creator, operation, and valid– are useful for exploring the history of these links When were these made? Who did it? Um, and we can actually talk about sort of time Schema, um, we can look at the master_property and the reverse property– the things that we were looking at before when we were looking at the /book/author And finally, uh, source, target, and target_value– these are the connections on each end of the link, and this is really useful when we want to reflect on the graph and understand a lot more about the connections So we’ll talk about each one of these sets, uh, in depth So going back to, uh, our link directive, we can actually just add in the specific property that we want Who is the creator of this link between Jane Austen and the English language? and lo and behold, you discover that it was me Um, now we can play out a little bit more of a history, uh, game Um, so starting with Jane Austen, um, we know that she’s an object, so she has a bunch of types associated with her And now we actually want to look at the links between this Jane Austen and the different types that she has So here, um, I’m opening up the–the link there I’m asking for the timestamp, the operation,

whether or not it was an insert or a delete, and whether or not this link is currently valid in the system And finally, I’m gonna actually sort, um, the output based on the timestamp of the link creation, so I can see these things in chronological order And what we get back, uh, is actually kind of interesting We discover that when Jane Austen was added, uh, she was given, um, the type “common topic,” that is she became a topic, um, and that was in, uh, October of 2006 And yes, in fact, it’s still true– she is a topic She’s also a per–a person, um, that was added in November of 2006, and that link is still valid, as well But then we get to this one, which is a little perplexing It says that she was a film writer The only problem is that she was long dead before celluloid was being used to create motion pictures So it’s a little hard to imagine her as being a film writer Um, happily, we can see that, uh, even though this was inserted in November of 2006, um, the link is not currently valid And if we look further and further down the chronology, we actually discover that, in fact, the /film/writer type–the link to it was actually deleted, uh, in, uh, what, July of 2010 So happily, somebody went in and corrected this, but we can actually see all of this history, and that’s because Freebase is really an append-only data store So we have all of this information about what has happened to these objects over time So great Uh, that actually sort of takes us through history And now we can look a little bit more at schema Um, so starting with Jane Austen and looking at this languages link, we can ask about, what is the master property for this link? Um, we can also ask if it’s the reverse And what we get back is, in fact, it’s not the reverse Um, that the master_property is /people/person/languages, and you say, “Gee, that’s really not terribly informative, because that’s the way I asked the query.” And I’m thinking, yeah, yeah, yeah There’s actually more that we can do here So for instance, if I were actually sort of building an application, one of the things I might want to do is to say, “Ah, I can actually open up this link, “and I can get out the display name of that property, “as well as learn about the type that I should find at the other end,” and I can do that all in this one package, so that’s great I actually get back to the fact that the display name for this link is languages, um, and the thing that I should find at the other end is, uh, a human language Now the only problem here is that it’s a little bit of a cheat Um, and that’s because, in fact, this could have been the reverse property So here’s a query where I’m asking about Ridley Scott, the director of “Blade Runner” and other like classics Um, and I’m gonna look at him as a film director and get back the films that he’s connected to And here I’m asking for the link between Ridley Scott and his films, and I’m gonna get back the master_property, the display name, the expected_type, and now I’m actually asking for the reverse of that master_property, so I’m gonna find out what that property definition is going in the other direction And I’m gonna get the name and the expected_type And you’ll see this “reverse”: null down here, um, is actually very important, because now when it comes back true, that tells my application that I should actually be looking at this area in green and using that as the information about this link Make sense? Cool Okay, just to make sure I’m not, you know, pulling the wool over your eyes here Um, so this is great So now we can actually formulate sort of meaningful queries We can get a lot of information back about the links that are in those queries Um, but, you know, there’s this one sort of thing when we’re actually exploring the graph, and that is, if I’m just looking at an object, how is that I can know what queries I should actually be asking? How do I know what things are actually connected to this? Um, and in fact, this is the mechanism that the Freebase client actually uses to display the information about these topics and help you navigate around the graph So we’re gonna look at, uh, again, uh, /type/link, and the last set here– source, target, and value– to understand how it is that we can reflect on these things Source, target is the idea that these links have a source– they start somewhere and they end in another place, and sometimes that thing that they end on is actually a literal value, oh, or a primitive value of some sort So that’s what source, target– source, target, and target_value are giving you So we can formulate a pretty simple query, which is, um, we’re going to use /type/link We want to start with the source of being Jane Austen We want to get the property that connects Jane Austen to other things, which, on the other side, are just topics Cool. This is a nice little reflect query

And we can find out that, in fact, she has a master_property of place of birth and that the target is en/Steventon, uh, and that, uh, she has a place of death, whichwas Winchester, uh, and that she has gender, and she is female, and it goes on and on– all of the things that are connected up to Jane Austen So that’s cool Um, but there’s really more to this story that we need to unpack And so one of the things, when you’re exploring, uh, Freebase schema– um, and you could to go the client and do this, but since we actually know how to write MQL queries against schema, we can actually write our own– and that is, can we actually find a property that has the expected type of /type/link on property? And lo and behold, there is, happily, a thing called, uh, links, which is all of the uses of this property And, uh, now I can say something like, for the property “place of birth,” I want to get back all of the links that are– all of the connections that are using this property And the first one I get back actually was made by a guy named Robert Uh, it was the source of Steve Martin and connecting him to Waco, uh, and that was done in December of 2006, and it’s currently valid, so apparently that’s the right thing to say So this is great So we actually have a way now of exploring all of the uses of a property And that’s kind of important, because in that query that we ran for Jane Austen, the reflect query, where we said that Jane Austen was the source, and we wanted to know the target, that tells us this story It says Jane Austen is connected to Steventon, and she’s connected to the English language, and she’s connected to “Pride and Prejudice.” But what’s missing is the fact that actually Henry James is connected to Jane Austen, but the connection is going the other direction Jane Austen is the target of that connection She’s not the source So that’s simple enough to fix We could actually just say, “well, we’re gonna run the query “that we ran before where Jane Austen is the source, and we’re looking for the target.” We could run another query where we actually say, “Hey, uh, Jane Austen is the target Show me the source.” That would get us back to Henry James. Cool But that seems really kind of inefficient So let’s take another approach given sort of all the mechanism that we now have under our belt We can start with this very simple query, which says for this object Jane Austen, give me back all of her types, that is all of the types that Jane Austen is using And for every one of those types, go to the schema and tell me all of the properties that those types have So we run this, and we get back all of the properties for all of those types Now we can actually use that links property on property, and we can say, “Great, for all the uses of that property, “show me where Jane Austen is the source Find me all of the targets.” And that’s essentially the same as our first reflect query, but we can extend this even further, and we can now reverse it And we can say, “Look, for the use of–of–of this as the master_property,” which means going the other direction, “um, I want to find all of the links for that property, um, where we want the source and Jane Austen is the target.” Cool. This give us sort of full reflection So now we have a way of actually picking up an object in the graph and knowing everything there is to know about that object And so once you have this capability then, I think, um, this really sort of suggests a new style of programming when you’re thinking about working with Freebase data, and that is to act on the meaning that you get back from the objects in the graph So the typical approach, and the approach that we’ve been sort of playing out earlier in this talk was to say, well, we start with an object like Jane Austen, we say, “Great, we know that she is a person, and she has a place of birth,” and then we go on and we say, “Great, we think that she has some languages Give us those languages.” And there’s a whole bunch of other stuff that we probably want “You know, she’s an author Tell us about her books” and things like that And so great, we, you know, do some “collabor-optomization.” We, I don’t know, package these things up We write a whole bunch of optional clauses, or we package these up in one MQL envelope to make the transport more efficient, but when we take that kind of rote approach, things go a little bit south when we actually have another person like Franz Liszt, right? Not an author, a composer We probably want to ask slightly different questions about him How are we gonna do that? Well, if we go back and use our reflect capabilities, We can write a query essentially like this We probably want to expand it in some interesting ways, but this is the basic skeleton that you would be using, and now you’re going to get back all of the things that are connected to that object, Franz Liszt,

and we’re gonna learn that he’s a composer And now the trick is to say, can your application take the results of this query and respond to those in useful ways? So if I want to sort of respond to him as a person, I could write a bunch of functions that deal with the properties coming off of here as a person to represent him as a person If I think that composers are important, I can go and add some more methods around composers to the system without ever having to change the query And the important thing to understand is that Freebase is a very live community, right? The graph is continually changing New connections are being made to objects New types and new properties are coming into existence every day If you have to go in and actually modify the queries and modify the code that’s digesting those queries and taking them apart and finding all the results, that’s a lot of work Much easier to write sort of a straightforward query that grabs all the information that it can and then sends it off for processing in the applications, so that you get some useful results back So this is great I think, you know, this is– this really does suggest sort of this property-driven programming I see the property, I respond to it, in which case, my application has really become sort of semantically aware That is, I’m responding to the meaning of these objects in interesting ways But that’s actually, I think, just sort of the first part of the story Metaschema is the idea that there are generalized relationships in Freebase So, for instance, if we think about Jane Austen and her relationship to Steventon, we can say, well, that’s not just her place of birth, but that’s in some sense, the origin of Jane Austen And similarly, we could look at her book, “Pride and Prejudice,” and we could say its place of first publication– that property points to London, but we could also think of that as being the PlaceOfOrigin for the book So now we have one property which actually describes two properties And so, if we actually respond to PlaceOfOrigin in some interesting way, now we actually have some leverage We can actually group properties together write a bunch of methods against, uh, these, and we’ll get sort of more power out of our application One of the nice things about metaschema is it actually isn’t, uh, just sort of single-property relationships, as well In Freebase, when you represent the relationship of an actor to the film, for instance, Colin Firth to the BBC miniseries “Pride and Prejudice,” there’s actually an intermediate node there, which is the performance, and that performance is necessary because we need to be able to tell you that on that path between “Pride and Prejudice” and Colin Firth, he actually played a role of Darcy, and we need a place to actually hang that property, and that’s gonna come off of the performance But in sort of a colloquial way, it would be nice to be able to say, oh, Colin For–Firth was an actor in “Pride and Prejudice,” as opposed to actually saying, well, he was in this performance, and that performance was for this movie So one of the nice things is that metaschema actually allows us to jump over relationships, as well So now we can actually have a HasContributor relationship between Colin Firth and the BBC production of “Pride and Prejudice.” So we went through the whole graph, and we looked at all of the properties in Freebase commons, and what we discovered was that about 3,500 of the properties actually fell into one of 46 patterns, and those 46 patterns package those 3,500 properties in interesting sort of, um, stylistic patterns And, of course, what fun would this mapping of those 3,500 properties into 46 patterns be if we didn’t actually go and represent that in the graph itself, so that you could query it? So that mapping, uh, into all of these properties into those patterns, is actually represented in what we call this metaschema schema And it’s actually very simple I won’t go into the details here, but the idea is that you can start with something like a property, /film/director/film, and you can ask what it– the relationship is, what is the predicate, the type of relationship that it has? And we can actually use that little query snippet, um, in our Jane Austen query about what properties are on these types, and we’re gonna get back, then, all of the different patterns of the different properties that are on those types So with Jane Austen, this isn’t actually terribly interesting, right? I mean, what we really know about Jane Austen is that she was an author Yeah, that’s great I mean, very important author and she’s really fun to read, but she’s not that challenging to actually represent But when it comes to somebody like Robert Redford, things go a little bit weird, right? I mean, he’s an actor, but we actually represent the fact that he’s a film actor

and a TV actor and a stage actor all separately But yet, if our application is responding to meaning, it would be nice to actually group those things together, so we can play out that same query using Robert Redford, and what we discover is, in fact, we get a much fewer set of predicates than we do of all the properties that, uh, he is actually connected to So we can actually add this little, um, snippet into our reflection, and now when we look at the connections between Jane Austen and all of the other objects, we can get those relationships back now as metaschema, as one of these 46 patterns So the idea here is really to reduce the number of properties that your program has to respond to, needs to understand in some sense We said that Freebase has a really rich set of vocabulary, and that’s absolutely true, but the question is, you know, would you really like to have to go and map all of those properties into methods, or would you actually like to say, well, there are kind of these 46 patterns that I’m interested in, and there are some very specialized things that I want to do, perhaps, around people? Um, or, you know, whatever it is that my domain of interest is, but the rest of them, I can sort of group into these larger collections and respond to in aggregate So that was sort of a whirlwind tour through this What I wanted to do, since we have a little bit of time, uh, is to show you uh, a very bad application, and I would not suggest that, um, that this is a framework in any way It’s just an idea out there that you can sort of look at and try to understand what’s going on Um, so this not-pretty application is designed for you to clone– um, it is– I tried to be as bare-bones about things as possible, so that you could actually see what was going on Um, and so, in this–oops Uh, I can ask for, uh, Jane Austen Oops Apparently, hard– hard to see what’s going on over there There we go And what I have is just, uh, two methods in this application, which are responding to, um, the results of the reflect query, so this is a very bare-bones template to display things In fact, all of the juice is in this, um, one function here, reflection, so I go to doreflect, and look at this, and this is just going through and actually running the reflect query, and then dispatching into a bunch of functions that I’ve defined, um, for the different properties that I get back, and I have a very minimal collection of these properties, or these functions, so I have something for common_topic_alias You saw that with George Washington And I’m saying, if there’s an image, show us the image, as well Now that wasn’t too interesting for, uh, Jane Austen, and let me show you, uh, what happens if we do Ridley Scott Here, we see the alias and the images coming back Again, not terribly exciting But if we wanted to do something more interesting for Ridley Scott, say, I could add a function in which interprets the, uh, relationship between a film director and his films, and so I’ll add this very simple one in It just goes off and gets the picture of the film at the other end of the link Go back and refresh And now actually, I’m interpreting all of the other– The query hasn’t changed at all Same information coming back I’m just dispatching on another property, so this is great I’ve got more information about the films that he’s directed here Um, but, uh Let’s see. If I do Jane Austen again, just to sort of prove that I’m not doing anything too magical here Um, this doesn’t help Jane Austen at all, right? She’s not a film director Okay, so great Let’s actually go and turn this into a bit of metaschema, so down here, I’ve added a function for ContributedTo, which is one of the properties that I’ll get back, uh, and then if I go and– it turns out that this reflect query and this metareflect query are literally the same except for that one clause, but unfortunately, I didn’t want to try to edit it on the fly since I’m really bad at the keyboard, uh, and I will just change the query that I’m using to metareflect Now if we look at Jane Austen, we discover that she has contributed to a whole bunch of films She’s credited as the story writer for these films,

not–not the screenwriter, uh, as well as a whole bunch of information about books and things like that, um, and if I go in and actually look at Ridley Scott again What I get back is not only the films that he directed, but he also has produced a bunch of films and acted as, you know, ancillary personnel on films, and so we get a whole lot more information about the films that he’s worked on And, of course, if we were really clever, we could actually display the property that was contributing to this metaschema relationship and things like that But the idea is that this is a really simple pattern which takes that one query now and allows you to do a whole lot of stuff with it, and allows you to incrementally change your application to respond to the data that’s coming back Um, so this is online, and if you go to the, um, there’s Let’s see here If you go to io2011.freebaseapps.com, um, all of the queries that I went through today are there, as well as a link to that application, which you can clone, uh, and modify and play with yourself, and then there’s information about Freebase documentation All of the things that I’ve been telling you are in that documentation, and we have a very active developer community Um, the Freebase mailing list, um, you can find out about it at lists.freebase.com So I’m happy to entertain questions, comments Happily, I left about as much time as I had hoped for All right [applause] man: This may be a silly 101 question, but what’s the relationship between the data that’s in Freebase right now and, say, other source data like Wikipedia or census data or stuff like that? Taylor: Right, so Freebase is a super-set of a lot of different data sets, um, so we’re continuously importing Wikipedia, uh, mostly for the topics themselves Um, and I’m not sure what the total article count in Wikipedia is these days, but it’s, I believe, well under 4 million, so given the 22 million that’s in Freebase, um, it’s a pretty, you know, small section It’s an important section, um, but it’s a subset of– We have other data sets, like MusicBrainz is brought in, um, for a lot of music data It’s augmented by a lot of other sources, as well, so Freebase is really, in some sense, the melting pot for all of these different data sets And one of the important things, um, is you can think about Freebase as sort of the Rosetta stone for navigating between these different data sets, because all of their sort of internal identifiers are actually maintained with the Freebase topics, so that you can actually use a Wikipedia article name, come into Freebase, and find the topic that that’s associated with, and then jet out to another data source, um, as well So it’s–that’s one very important way of actually using Freebase man: I’m curious how and if MQL handles transitive predicates Taylor: Ahh man: And, uh, in terms of querying either n-hops or, uh infinite hops, and within the topic Taylor: Yeah, so we– MQL does not actually have any transitive operations, so it’s up to you to navigate those links yourself man: Okay, thanks man: I know Google just recently put out a whole lot of the U.S. Patent Office online Is there any chance of getting this integrated? Taylor: Uh, you know, I think it really depends on sort of the interest of the community, um, but, yes, we’ve actually– people have added patent models before in hopes that we could actually do that type of thing I don’t know of any current plans, but I think it’s a pretty interesting idea man: So my question centers around using MQL outside of Freebase Obviously, it’s a very expressive syntax, and it would be very good for other APIs, so my question is, do you have any advice on– um, is it very feasible to be able to, um, incorporate the same syntax and, like, what’s some good things to read? Taylor: So this has actually come up on the mailing list, uh, in the past, and we know of actually at least two other MQL implementations that are out there Um, there’s a geo data set and another knowledge base that is actually using, essentially, MQL syntax In fact, the geo dataset– um, their documentation just pointed to ours Um, they had very specific things, but the actual MQL syntax and things like that, they wanted you to actually look at sort of this one source And we’d be, you know, very interested and excited to see other people adopting it

I don’t have anything specific to say about the implementation I’m not sort of in that world, but I know that, you know, there are actual papers that have been published on, like graphd, the underlying data store and things like that, so I’m sure if you asked, you know, hard questions that you might have, on the mailing list, somebody would probably be interested in talking man: Uh, hi. Thanks for your presentation Uh, so are there any uses of Freebase that are not permitted? Basically, is there an open license? Can Freebase be integrated into commercial applications? Are there different content licenses? Do people own what they put into Freebase? Et cetera Taylor: Yeah, so that’s a great question Um, so everything that’s in Freebase in the graph is available under a Creative Commons Attribution license, and you’re free to use that however you want in your own applications, commercial or otherwise The data– things like the images, um, and some of the article descriptions, are under other open licenses, and you can actually get that information You need to make the proper attribution if you’re using those things that are coming out of what we call the blob store, um, but there’s also– and I didn’t put it up there It’s in the documentation– um, the data dumps, which is a very popular way of consuming the data en masse if you’re going to do processing on it for different applications, so those are being produced on a very regular basis, so you can actually look through the whole graph, extract the information that you’re interested in, and use it, uh, with the proper attributions Great. Well, I really appreciate all of the questions and comments, and feel free to contact me if you have any further thoughts Thank you [applause]