Azure Custom Vision: How to Train and Identify Unique Designs or Image Content

>> Learn how to infuse AI into your Java application In this episode of The AI Show, we’ll learn all about computer and Custom Vision Cognitive Services API Welcome to this episode of The AI Show where my colleague Ruth Yakubu is going to show us how to infuse AI into a Java application My name is Seth Juarez All right. So, Ruth, how do we get started? >> So, to get started, let me just give a little background of what we developers face >> Okay >> There are two types of developers, the one that is sitting in their cube, and the business has business requirements that we can and leverage AI But they’re like, last I check, I’m not a data scientist How can I do this? Then, that’s when you realize that Microsoft has a lot of Cognitive Services that span across different industries, that our developers can leverage and hit the market real quick >> Awesome. So, if I have an application and I’m like, hey, I want to put some AI, but I don’t know what to do You’re saying Cognitive Services is an answer to that >> Yeah. Because our researchers have done years and years of research, and tons and tons of data curation, analytics, machine learning, and what not, and provide all of that >> So, how do I burn it into my application? >> Okay. So, for this application, I’m using Java, and one thing that is super exciting for us in the Java community is, we now have Maven, STKs for Cognitive Services So, you can do all of that with the rest applications, but you have STKs now So, for the application I want to show you guys today is, I’m going to create a lost and found type of situation where we’re using Computer Vision, also tie into text analytics, and I’m only tying into text and analytics to, okay, you load something, how can we query something and find a result based upon the tags that were generated from a image? >> Awesome. Let’s dive in >> Okay. So, to start off with, I’m in Eclipse So, this is a Spring Boot application We don’t need to go into details unless you want us to go into specific details later So, what I’m going to do is, launch the application So, what I did was right click It’s a little bit too late now But basically, what it’s doing is, building the application and the end is going to launch the Tomcat server >> Awesome. So, this is like a regular Spring application It’s a website, MVC style type thing >> Exactly >> Cool >> Yeah. So, now that we see that the application is up on port 8080, let’s launch the application So, now, I’m going to open a picture that we’re going to analyze So, what I’m doing here is, I’m calling the Computer Vision API, and the method I called was analyzed So, first, let’s take a look at the image. There’s a car There’s a parking lot type of situation There’s a background going on So, the good thing about the Computer Vision is, when we call it, it’s going to return attributes about the image that it finds So, it’s going to generate different tags, so you see a whole slew of these Car, road, grass, driving, parking, and all of that So, just take in a eyeball of it All of those are accurate compared to the image that we just uploaded >> Awesome. And this isn’t data that you upload This is something to be clear that the service returns to you based on the picture >> Yeah >> Okay, cool >> Yeah. You just upload a image, call the API with the image, tell it that please analyze this image Then, all I’m doing is returning the JSON, not even do anything Just to show the users what the Computer Vision does >> Awesome >> So, let’s get back to the main point that I wanted to show, the power of this and different ways you can use this So, think of a lost and found type of scenario, right? You can never anticipate what users are going to lose

So, I’m guilty of that I lose a lot of stuff So, let’s do a very generic one Kids, I think they specialize in losing things So, let’s say, I’ll upload a teddy bear It shows the attributes like before Let’s upload this Mercedes-Benz It’s also a key, and see what it returns back So, for the Mercedes-Benz, this was kind of interesting The tags that it was returning back is, it’s a table, sitting, black, piece, top, luggage, desktop, players, suitcase, collar, phone, [inaudible] actually mouse So, for the human eye, personally, we’re used to how keys looked liked, especially in the medieval times There’s certain way a key looks like But nowadays, what do we do in outliers where keys are no longer the traditional way of how you see a key? Let’s say, I’m fresh off a boat from somewhere I have never seen a Mercedes-Benz key >> I know I haven’t >> Yeah I haven’t seen one either But yeah, this one, I can easily think it’s a garage door opening >> Right >> Because it can pass for a lot of stuff So, let’s go back to the analytics part This is another piece that I wanted to highlight to developers Let’s say, you have a whole bunch of text, right? You’re dealing with the lost and found, what are users going to do when they come in to tell you what they lost? They tend to ramble on and on, but you need to get to the gist of it The whole point they need to just tell us is, I lost the teddy bear, I lost a key That’s it, but people like me, I’m going into a form like, “I was at the food court-“, if I can spell, “-and I think I left my toy at the table.” and click “Submit” So, one thing is the very first thing you look at is okay we call that “Text Analytics” and that one also has multiple functions like sentiment analysis, key phrase extraction So, in real life, this person would have written like a three paragraph type of description of what happened, different possible areas that they think they’ve lost something, but the key thing is look at the things that it extracted, it got that there was a food court, there was a toy, there was a table So, what it’s going to do is look in the database, because in the last page what I was doing is all the image for a lost and found let’s say each time somebody loses something you take a photo of it, store in the database Later on when somebody comes, we no longer need to go and manually go aisle after aisle searching for stuff you can also use AI to find some of these stuff Just do as when somebody submits that, let it do a query and see what matches they’ve found So, the keyword was toy, it brought back to teddy bear, I uploaded the traditional key, it didn’t find that, but for some reason I think it’s probably thinking the car keys >> If it’s a toy or it could be somebody’s toy as well Right? The thing I like about this is that you actually allowed users to upload pictures and ask about those things without having any user intervention at all >> Yes >> Using this kind of service, I have two questions The first question is, what does this look like in code? >> Yeah >> The second one is, I saw there were some mistakes when the pictures were uploaded, is there a way to fix that? So let’s start with the first, let’s take a look at it

>> Yes, awesome So, to start with, this is the Spring Application, right? So, when you’re talking to a web-type of interface, in spring you have something called controller So, those are risk controllers It takes requests from the web translate it into your business application So, I’m depending on some help or class or services that will actually do the API requests, so I’m calling them in >> Is the package you import from Computer Vision? >> Say that again? >> Is this a package that you import from the Computer Vision services? >> In the service class So, we’ll open that real quick So, let me do a quick drive by, this is a typical REST Application Right? So you specify annotation that, “Okay, I need the request mapping, what’s the method? In REST, you’re going to do a Git, you need to specify the path So, for the scenario of when we’re doing dealing with the text analysis, that’s what I was doing Here, I’m returning a form but you can return whatever you want to I’m using Thymeleaf >> I see Because I’m not too Web-centric but I’m just specifying the HTML file name it should call So that’s just the display part of it So, when somebody pulls something, enters an information, now, as you can see, the user entered a description So, you’re taking all of that And now, you took the description and fed it into a service that did some analysis, that’s the API that we called, return JSON body and you parse the data the way you wanted to present All I’m doing is parsing and present it to the user, they’re also grabbing the URL, so we’re good on that >> So the service, the text service, is it like a package you just download in order to make the API calls? >> Yes >> Okay >> So, let me show you this service because even when we go to the controller, it looks exactly the same, the only thing is before I return the JSON, I’m saving it to a database >> Got it >> One thing under Spring is Domain Well, Domain is like your Database Model I get into Repository Let’s get into which one do you want to look at? >> Let’s look at the Vision Service >> Okay. Vision Service In order to use a Vision Service, the very first thing is you need to provision the service on the Azure portal and the key thing is in order to call this API or if you’re using the SDK from Maven you need a key So, once you establish the key, you need to specify the key in your header then what’s going to be your body, you need to pass the parameter in the body which is going to be the image file, then the rest is okay Going to our API, definitions that we provide online for each of our AI Cognitive Service that okay if you want to call this, this is how you call it Basically, for Java I’m just setting up all of the parameters that I’m going to be calling Finally, once you build all the parameters, the thing that you do is do a build, and where I’m actually executing this Spring provides something called a rest template, where you’re passing in that builder and you need to figure out that the API that you’re using is the polls or Git Then the final thing- well, the entity is basically the header Then the last thing is, okay, what’s your output going to look like? Is it json output? So that’s basically what that is >> So this is what a rest endpoint call looks like in Java? >> Yes >> Cool >> So let me do a quick drive by In this situation, I wanted to use the SDK to give you guys an idea

For the SDK, if you’re a Java developers, if you notice, I’m using a blob storage when I upload this images So, the awesome thing that a lot of people in the Java community are having misconception is with Microsoft Environments Tools or Azure is but not knowing we have so many for SDK’s that you can go find on the Maven Repository So, this is an example of a Library you can just call and that’s the blob storage >> Cool, so when you upload the picture you’re just pushing those up to blob storage? >> Yes >> Smart >> Before you do anything, I know, yeah, we’re gonna delete, blur out my credentials, but in order to do that, the very first thing you do is provide your keys and the unique blob and the container, but if you notice I’m not doing necessary REST calls because now you’re calling the objects This is how simple and cleaner your application is going to look when you’re using SDK >> Got it >> Okay >> Cool. You were telling me that there are some that just don’t work, tell me about those >> Yes. So, AI is new and it’s up and coming and that’s why there’s so much buzz about it and all these companies are improving on their algorithms and whatnot So, for the scenario that you’re in a situation that you use something like a Computer Vision API, and you found some outliers that did not meet your business case, what you can do is use something called Custom Vision API >> I see. So in the case where I upload a picture of something it may have never seen before because I have crazy stuff in my house, there is a way to actually train it to recognize new things >> Yes. So for that one, the awesome thing is you can go to computer then >> Custom Vision >> I’m glad somebody is paying attention Then say AI Then the next thing we need to do is “Log In” to that account Log-in never took so long The very first time that you log into the account, you need to agree to the terms of services I already started a project, but if you didn’t, you click on “Create Project” Give it a “Name”, “Description”, and it’s very crucial that if you know the “Domain” it falls under, you may want to put it in there because if you’re dealing with a category that’s in the “Domain”, it increases your chances of finding things better >> Got it >> So, due to time, I created keys So let’s take a step back, the Custom Image service is a classifier It goes on the machine learning classification, algorithm, and what do classifiers usually do is either yes or no You’re trying to find whether this is this or that So, I think the amazing thing that the Microsoft AI Research Team has provided for us is the ability for a developer to just come in in a situation where car keys are getting more and more involved It is a key, but it’s not the traditional key, and depending on what industry you are, it could be other things like a designer handbag, okay? How can it recognize the Louis Vuitton bag versus bag versus a Prada bag? Not that I know anything about that, but I just wanted to show you how simple this is The recommended image is for you to upload at least 50 images, but I think for my image, all I wanted to do was differentiate,

check whether tell me the difference between a BMW key and a Mercedes key So, I just gave it those two categories So I uploaded different images for the BMW key and uploaded a whole bunch of different images for Mercedes keys They’re not quite up to 50, but you’ll be surprised with very little It’s still very accurate and predicting your model Once you have your models uploaded, which is everything is very intuitive, click on the “Train” I already did one in the past, so the last iteration I did was okay I uploaded the images There are two categories, right? When you’re training your model, you need to see how precise is, and the recall is okay, the error factor of it like what are the chances that in this group, there is some that did not match what you’re trying to train? So pretty straightforward But looking at this, I think I’m pretty confident that 90 percent of the time, it’s going to recognize what I’m trying to do So, the very important thing for a developer to do is this part right here I need to figure out that, when you take it back to your application, are you going to upload it? Because you can train everything that did visually, you can do it via code Then in our case, I’m just going to call the API. I trained it Everything is good to go It shows you the API to use, so I went ahead and create a custom service So, I have the project key, the RESTless API call >> It’s just like the same thing that we had before, but now with a different service >> Yes. The only thing is they call it prediction key Or other APIs, I think it’s all CP subscription ID key, something like that So, be aware of that You’re uploading an image in the URL type of format You’re calling the REST API Then nothing else really changes Let’s get back to the controller and call this custom that’s coming out the other one So, we’re at the end Let’s see how good this prediction was I’m going to restart this Java application Okay, perfect So, it’s done The next thing is, I’m going to go to my trusted Computer Vision, and now let’s find the same key and upload it and see what it says Okay >> It looks like it did indeed find it to be a Mercedes key >> Yes >> Awesome. So, what I’m understanding then, so I want to see if I can summarize it, what you did, you made a lost and found application, where you can upload pictures of like someone’s like, “Oh, they lost this Take a picture,” and that’s all they do. They upload it The Computer Vision service puts all the tags on there Someone goes in, types using text analytics and text analytics has all the extra stuff out and maps the things that it found >> Yes >> Then there’s cases where there are certain things that it won’t understand that you have to go back and train Did I get that right? >> Yes. One thing we have to keep, we have to take into consideration, when you’re using Custom Vision, there’s a limitation What if you have petabytes of data? In those type of situations, you have to be cognitive, “Okay, I’m going to train this model There’s a way we need to export the model,

and if it’s too big, we need somewhere to run and has a system capacity to run that.” Another story, another alternative to bypass all of this is going to deep learning There are several algorithms out there that you can use, and if you know what you’re doing at this point, you can code all of that and come up with maybe a better solution I won’t say a better solution >> More customized >> But a deeper. Yes, more customized. Yes >> Awesome >> Depending on where your industry is >> This has been super helpful like pulling as I like that you show it all in Java to show that it really doesn’t matter what language you use Thanks so much for you today We were learning all about Computer Vision and Custom Vision, how you can enrich your apps with AI today Thank so much for watching. My name is Seth Juarez See you next time