Build Advanced Search Experiences with a Custom Query Language – Adrien Trauth – JSConf EU 2018

So, hello, everybody My name is Adrien Trauth, I’m a frontend engineer Our topic today is how to build advanced search experiences with a custom query language First, I will focus on the use case I will explain what it is and the benefits of introducing one in the search interface And then how to implement it in the code of your advanced search And finally, the solution for your users to make sure they are aware and how to write their own queries So, let’s get a bit more into the use case and why is an advanced search necessary and custom query language, improves the user experience? And spending most of my day marking the lab management tool and doing lab management is about allowing the user to search for millions of events and find the right one This is the search Since it’s about traversing, you run a lot of searches before finding the information The search needs to be efficient because you’re running a lot of things and using the tool a very long time Also, because you need to find the right log that will explain what’s happening inside of your infrastructure, your search has to be precise or you will not find the needle in the haystack They can be system logs to database event to system logs, the data is not known So, you have to make yourself flexible enough to search all kinds of different data And so, those are our requirements for the search They’re not really specific to log management But this is what we will be working with today And the common search interfaces match those requirements So, basically word search, you find on a lot of websites It’s nice because it’s simple and you can get start in a few seconds It’s really efficient You type a few keywords, and, in a few seconds, you already have your first results But the limitation with that search is precision Because, for example, here, if I’m on the phone, I’m looking for a blue chair, I have no way to tell the system that I’m not looking for either something that’s blue or a chair And I’m looking for a chair that’s specifically blue So, you have no way to give the system what your intent is And to give the context about what you’re searching for So, the way a lot of websites solve this is that the sorry they introduce an advanced search pattern And when you click on that pattern, you will see a lot of different inputs and everything And what this gives to you is a lot more precision because every input has its own intent If I select the color blue, all the results are blue this time What you notice here with that pattern is efficiency You went from a search where you could type a few words and get results to a search where you have to click around a lot and find out which inputs are relevant to you And also, it makes this a lot less readable because you have to look if you want to know exactly what you’re searching for, you have to look at every input and the value of every input And finally, it’s not that flexible The way you set up the inputs defines the ways that searchers search for something Here I put one price range, and if the searcher wants to search across two price ranges, they cannot do that As a developer, I know another way to search for results That’s query languages You send to, for example, a database or an API some text that’s going to return your results The first query that you usually learn about when you start, I don’t know, when you’re a child, is natural language That’s asking a question in English It’s very flexible Because you can ask any question in English and it’s always making sense That’s your own language The issue with natural language is that it’s not made for computers So, computers, they struggle with nuances And also, the range of questions you will be able to ask is a lot bigger than the question you can answer on your website And so, if you have Google or Facebook money, it may be fine You may be able to solve that problem But for your own tool and website, it’s probably a little bit overkill The second query language that I encountered was SQL

So, that’s a query language for databases It’s very powerful because it allows you to query any kind of structure data Also, it has great support You have a lot of databases supporting it and if you want to implement it, you basically send the search through the server to the database You have nothing more to do But this is for us, it’s not really good for our use case because you have to know how the data is stored to be able to query it, and that’s not something we want our users to care about And it’s a bit too complex And it’s not what we want there And so, one of the systems that tries to solve this is GraphQL Since it’s an abstraction of a data storage, it’s a bit better because you don’t have to care about how the data is stored And what’s really nice also is that you can define easily what kind of output you want for your data But it’s still very much targeted to APIs So, for example, if you see that query here and you would it’s really nice if you want to send it over HTTP, or if you write it in your code But if you want a user to write that query, it’s a bit long, it’s a bit verbose So, that’s not really what we want here So, since we’ve not been about to able to form anything that matches our requirements, since I’m a developer, I made my own query language I found companies that did that There’s GQL, allows you to search through your tasks and issues There are some tools, they have their own way to create data And data that is there is another team that made its own query language to the query data visualizations And so, we are two different teams finding the same way to solve problems So, one of the benefits of having your own custom query language First, it’s as high level as you want So, you are able to decide what you want your users to care about or not So, for example, if you want your user to be able to define the output format, like in GraphQL, you can include that in your language But you want to make it a bit more simple, you just choose that and do your own thing It’s also very specific to our use case So, you can make sure that every word that the user types is relevant to your demand and express what the user is looking for and there’s not technical getting in the way And finally, since it’s high level and user specific, it’s readable The only in the query are about the domain and not technical So, this seems to be a good solution for our use case here So, to be and it’s probably not perfect, but we’ll get into the limitations a bit later Now that I know that I want to make my own query language, I have to define it And so, a language definition is called a grammar And so, that’s going to be the set of rules that define what valid search is and the meaning of each word in my valid search Two things you should care about when writing your grammar is you have to make it flexible for the user to write any kind of different queries And make sure your most important concepts, the first thing the user is going to look for is the first thing they have to write so that they can benefit from your query language very fast Let’s give you a quick example to give you a better idea of what a grammar is So, that’s my simplified English grammar Obviously, we’re not going to get into all the rules of English now But if I were to define grammar for English, the first rule is a sentence is a set of words separated by spaces And if I use that rule, I’m going to be allowed to say, well, I have three words in this sentence I like meatballs And then I can use a second rule which would say that the first word is the subject, the second is the verb and the third is the object to say that, oh, I is my subject, like is my verb and meatballs is the object By defining the rules, you can extract words and the context of the words and find the meaning of the sentence And we know that we and we can apply that to my example, which is still here If I were to write a grammar for my search language, I would be able to define a set of rules that say, for example, a search query for my furniture would be furniture, separated

And there is the type of the furniture and then brackets and inside the brackets there’s a list of attributes I care about for this furniture By using that, I can say, hey, I’m looking for a chair and a couch And the chair is blue And the couch has the attribute, green And if I want to go even further, I can say, well, actually in my attributes, every attribute that starts with a hashtag will be a color, and every attribute that ends with a currency will be a price So, that’s how you get into writing your own query And once you have defined all those rules, you can once you have defined your grammar the next step is actually to implement it in the code of your search And so, that’s making the computer understand the set queries So, the goal here is to get from a string it an object that will be easier for my code and contain all the context that was extracted from the string And then once I have an object that’s very simple, I can use it anywhere If I want to create databases, if I want to extract information for the user or anything like that Once you have an object, you’re pretty much on all the jobs So, to go from the string to the object is actually called parsing And so, parsing will say, well, actually, from that query I extract that one word is “Chair.” And since I have grammar rules, chair is not a color, it’s the first word It’s not inside brackets It’s the type That’s what parsing is The first thing I think about when I want to extract information from the string is to try to raise the regular expression Regular expressions are really efficient when you are extracting information from emails or telephones So, when you know the number of things you want to extract from the query, it’s really good But when you don’t know, your expression gets really complex And also, it’s hard to extract the element you extracted It’s easy to extract, for example, the chair But if I want to extract it’s a furniture type, that’s more complex And I found this really good code on StackOverflow that says if you write to regular expression, you go with a cup of coffee, then you come back, and you have no idea what you just wrote Maybe a regular expression is not the tool for you So, I think that’s explaining really why this is not good for our use case here So, the other way to extract from a string is to generate from the language From the grammar I just explain in English, I can actually generate a parser from that And the way to do that is to write the grammar in another grammar So, there is actually a way to write grammars and there are rules to write grammars And so, the one we’ll be using today is the passing expression grammar I found it to be one of the most readable ones, PEG How does that work in practice? It seems hard now That’s a grammar rule A grammar rule is made of first the name of the rule and then your parsing expression So, your parsing expression will define the input you want to match And the output you want to return from that match And so, here for our type rule, I’m matching a range of characters repeating any number of times And then I’m using that much results which you will be an array of characters which is the full string of the furniture type It looks like a regular expression, right? Yes, but what is really nice with this way of writing different rules is that it’s really easy to combine them So, here, for example, if I want to extract my simple character match, it’s actually adjusting the new rule and using that new rule in my furniture type rule And what I see is really good about this is that it’s really easy to reuse some and in the other ones And also, it makes your numbers definition a lot more readable Because here I can read that my furniture type would be a set of characters and I don’t have to care about one type of character or not That’s different rule So, it allows you to write really complex rules in a very simple way

So, another way to combine different rules is to put them in sequence So, here, if I have to define a price, I just need to define first what the number is And then what a currency is So, number will be an integer And pound, Euro, anything you would want to And from that sequence of characters, my parser will try to match both If only one of them matches, it’s just going to fail And once I have matched both those rules, I’m going to be able to use both match results which I call here amount and currency And I will be able to combine both those match results actually inside the brackets That’s valid JavaScript And to define a new match result for price, which is really a combination of the two others And a final way you can combine didn’t rules is to put them separated by slashes And then you will be what that will do is that this is this object from the options So, if I define an attribute as a price or a color, what the parser will do here is going to be to try to match price And if that succeeds, I’m going to return the same match result as for the price If that fails, I’m going to go to the next one and match the color and et cetera and et cetera And once you have that way of writing different rules and to combine them in didn’t ways, I can actually start defining my full grammar for my furniture store And here so, I said in any grammar, as I explained in English, the furniture will be a type, brackets and any number of attributes What I think is really amazing here is exactly what I’m reading in the code That’s valid code I wrote the full query grammar to make sure it’s something that makes sense And so, here if you read the first, it’s defining what a furniture is And I’m defining it as a furniture type and then a list of attributes And if I want to see what a list of attributes is, I can just go to the last line and see, well, a list of attributes is just one opening bracket, any number of attributes, and one closing bracket And so, what I think what I love about this is that I can every rule is actually quite simple And you can go from very simple rules to a full query language in a way that’s really testable Really easy to improve on, really easy to extend That’s good that’s really readable and really simple And once you’ve done that, once of your full query language or your rules defined, you just have to run it And that’s going to generate a parser that will that you will able to import in your code So, I’m just running it through the command line you can use the results in Webpack loader if you want to make it automatic And now that I’m generating my parser, I’m just importing it Running my queries through it and I’m going to be able to have the exact object I was looking for So, I’m able to parse my different queries I can just use that in my search Hopefully type chair, hashtag blue And I’m only going to have a blue chair I used to have my keyword search that would return either chair or things that were blue There was a blue table in the results Now the query language provides intent Now I can say that the color is applied to chair And blue is actually a color I can implement it in my sorry It’s matching all my requirements So, I have the intent, so, that’s precise It’s also efficient because I just have to do a few key strokes to get the results And it’s readable, I have the full search in my query And it’s flexible I can have any number of attributes Any number of furnitures Flexible is actually up to you when you design your search language It’s a balance to find So, did we solve the search problem forever? Well, not really Because I think a custom query language comes with one big limitation, and that’s user onboarding Because know that you have a language that is valid or not The first experience you can give to your user is that before they would type anything and that would always work But now what they can do is the first search they type is not valid and you won’t be able

to parse it and they won’t get any results And that’s really frustrating for your users So, you need to teach them two things before they are able to write their own queries And the first thing is the grammar What is a valid query? And the possible attribute You want to tell them, you can search for a chair, a coat, anything you want You need to tell them what values they can put inside of the queries The first thing you need to do is add documentation That’s really important You always need documentation And to give quick access to the documentation But that’s not enough because to make them want to go to the documentation you need to give them a few valid searches, so they know that, oh, that’s really useful I want to learn more about that And so, one way that’s worked for us is to have a playground, like an advanced search We are adding back some inputs But the difference is this time you are not using the inputs to make queries to the back end You’re using inputs to modify the search bar and make a valid search inside your search bar So, let’s see how that would look Yeah So, here I select chair, and adding chair to my queries and then blue And once you have a valid send, the user will be able to by himself to improve upon it and start typing his own search queries Because it’s stitching by example And that’s really good You want to show him what it looks like and show him, oh, it’s really simple I can do it myself now It’s progressive because you start doing it, and when you feel ready, you can start modifying your search And if you feel like you’re even more ready, you can sorry you can write your own search and hide their playground You cannot use it anymore And it adapts to the user Because if they want to learn more about it, they can type things But they never want to run it because they only come on the website once They can click the inputs and never care about the query language It’s really not an advanced search because you don’t need to use the chat box anymore You can use them or not You don’t have to have all the attributes You can add them in the documentation And since the state is synchronized with the search bar, you don’t need to read the state of every selected patch, so it’s still very readable And since I have a parser, it’s easy to implement because I know how to extract information from my query to select the field And I need to click to add the value to my query and to print it again And that’s kind of easy to implement because think object is simple So, I just need to push a value inside the object And then to define a print function And the print function is actually it’s great because all the context is in the object So, it’s really easy to go from all the context back to a string So here, for example, I say furniture will be a furniture name and the print value of all the attributes And for attributes, it seems in the attribute name it contains a lot of context too I can say, well, if it’s a color, you print it as a hashtag and the attribute value We have a way to generate this, but if you’re writing it by yourself, it’s actually fairly straightforward And a few things you can add is error reporting Telling the user, why is such queries not valid? Syntax highlight if you want to have the information And autocomplete showing what values they can type Set of colors or chair when they start typing the query So, autocomplete is really efficient So, the benefits of having some query language is it’s efficient, it’s precise, it’s readable and it’s flexible The drawback is that you are teaching your users and it becomes crucial and you have to care about that a lot more than with a simple keyword search So, good use cases for custom query language would be a tool that’s used repeatedly If you have a tool that’s spent a lot of time on, being efficient becomes a lot more important Also, sharing Because it’s always nicer to share a small search that’s readable than a full object when it’s inexact And if you want to have integrations like APIs or integrate results, it’s nice for the developers to be able to type the query, copy and paste it in the HTTP request and that’s just going to work Thank you for listening If you have any questions [ Applause ]