Building Smarter Apps with MLKit

[MUSIC PLAYING] MOYINOLUWA ADEYEMI: My name is Moyinoluwa Adeyemi, and I’ll be talking about how we can leverage the ML Kit to build smarter apps on Android So a brief history Machine learning started from the computer, but imagine trying to start The modern mobile phones of these days have enough capacity to process as much as traditional computers can And our phones and tablets are now powerful enough to run software that can learn and interact in real time ML is helping us build smart apps An example is the Google Apps prediction for the next apps you are likely to use, and a mobile app which solves the world’s biggest problem– finding the perfect emoji A lot of SDKs exist on mobile that help run machine learning on mobile devices Some of them include software development kits developed by popular companies Amazon, Intel, and Microsoft And we even once had the Mobile Vision API created by Google Those were released two years ago, and they have three core functionalities– to detect faces, to scan barcodes, and to detect text The Mobile Vision APIs have evolved into the Firebase ML Kit And according to the official Mobile Vision API website, the Mobile Vision API is now a part of the ML kits, and will no longer be supported So all the new functionalities are now going to be released for ML Kit So if you have any projects using the Mobile Vision API in production, then start making plans to migrate them to the ML Kit The ML Kit enables Android developers to build smarter apps, like apps utilizing machine learning So let’s dive a bit into what all that is, and how the ML Kit can help us achieve our goals So machine learning, defined very simply, is how we get computers to learn by feeding them a lot of data That’s a simple definition, but it kind of gives us a general idea of what we are up to We build a sample model, we build a model with sample inputs, feed it to the computer, and then we have the ability to improve or retrain that model over time, hence improving the accuracy and the learning capacity of the computer This can be easy or difficult, depending on whether or not you know how to build models, or if you have enough data to train that model with And this is exactly where the ML Kit comes in It contains pre-built models for some use cases we are going to see in a couple of slides And then you don’t need to worry If you have no machine learning knowledge, you don’t need to worry about building models, or training them, or all of that stuff All of that is abstracted and hidden from you You also don’t need to worry about optimizing those models to run on mobile devices, because that’s also very huge headache All that has been handled with the ML Kit The best part, you can use all the functionality out of the box just by importing it in your Gradle file Android developers here should know what I’m talking about OK, so the ML Kit reduces the barrier to entry for building smarter apps That means anyone can get on board So whether you’re an ML beginner developer who’s just fascinated by the idea of smart apps, or you’re an advanced ML developer and basic the use cases the ML Kit offers doesn’t cover your use case, there’s something for everyone here Other features from the Mobile Vision API have been ported over to the ML Kit The APIs have been modified, but at its core it’s still the same functionality And these APIs that are provided out of the box are called base APIs And this means that you can use this without the need for a custom model And apart from those three, we also have two more That’s the image labeling and the landmark recognition That’s two more added to the ML Kit And then, there are even more features to be released very soon Smart replies are seen in Gmail, where you type something, and– are seen in Gmail, and then high density face contour addition to the face detection API So unlike the Mobile Vision API, the ML Kit offers a lot more functionality by providing access to both on-device and cloud APIs The on-device APIs process data locally, even when the device is offline But if you need a higher level of accuracy, then you need to turn to the cloud-based APIs These have some free quota and then some pricing plans depending on your usage

And the cloud APIs allow us to take advantage of Google’s infrastructure and takes all the processing off of the mobile device So OK, you are an advanced ML developer What if you want to do more than detect text, or detect barcodes, or detect images? Yes, that’s possible It’s possible to include your custom model using the ML Kit And you can do this by retraining building [INAUDIBLE] or retraining a model, and then converting it into TensorFlow Lite model, and then passing that through the ML Kit API So Firebase takes care of hosting your custom models and serving it into your application At a Cloud Next 2017 talk, Yufeng Guo, a developer advocate at Google gave a talk about TensorFlow and Android And he highlighted several pain points of developing machine learning on Android with TensorFlow And it’s interesting to see that the ML Kit now provides a solution to each of those problems he highlighted in that video For the first concern, how to include the custom model in the app– you can either choose to do that locally, or store it entirely, or host it on Firebase And this will determine whether it’s available offline or not If we choose the two options, then the app will load the model from Firebase when it’s online, and then use the on-device model when it’s offline Next, you may have security concerns How do you prevent other developers from copying your model, for example? So in practice, most models are application-specific So this is like worrying about another developer disassembling your Android Going like, how far can they go? But it’s something you should still be aware of What about conditions to download the model? Is it going to be carried out only via Wi-Fi? Do you want the device to be charging? Firebase provides us with a set of APIs which allows us to configure all the conditions with which this should be done The available options are when the app is idle, when it’s charging, or when it has Wi-Fi And then, Firebase automatically provides users with the latest version of a model if it’s hosted on Firebase So if you have an update, you don’t have to make users download the new version of your app before they get your updates Once it’s hosted on Firebase, that takes cares of automatically dispatching models And then, you can also use this to have, like, different test users for your models Firebase handles all of that Next, we’re going to explode the base APIs and see all they offer One of the base APIs is the text recognition API And it detects text in the Latin-based languages And it works in both image and videos The API segment works by segmenting text into blocks, and then paragraphs, and then words and symbols A block is like a group set of text on the same page If you have a set of text on a page, that’s a block If you have paragraphs, paragraphs apply as we know them And then, words also retain their meaning here, while symbols are the individual characters that make up a word Technical mission is performed both on-device– so this API is available both on-device and on the cloud, although if you use the cloud API, you have a higher level of accuracy, although it’s not entirely free I think the first 1,000 uses are free, and then you are billed per use after that All right, now, I spoke a lot about the cloud APIs and the on-device APIs I spoke about how the cloud APIs are more accurate than the on-device APIs So we are going to have a pop quiz to see whether you’ve been following all along If you look at the two images, which of them were detected? Can you detect which of them were detected on-device or which of them were detected would be Cloud Vision APIs? AUDIENCE: [INAUDIBLE] MOYINOLUWA ADEYEMI: What is A? AUDIENCE: Device MOYINOLUWA ADEYEMI: A is device Why? AUDIENCE: [INAUDIBLE] MOYINOLUWA ADEYEMI: There are smart people in this room So yes, the answer is A, because of course, since the on-device API does not have as much– OK, so it recognizes only Latin characters, so it doesn’t know what an ampersand is, and just groups it together since it assumes it’s a word

And then, the cloud API is smart enough to know that that’s a character, and it’s going to then separate it from the other two words OK, that’s all about the text recognition API And now, we’ll talk about the face recognition API This detects human faces in images and videos, but it doesn’t work for animals, as you can see And then, facial detection is not equal to recognition So if you have a picture, the API knows there are four faces in this picture But it doesn’t know that this is [? Moy, ?] or this is [? Lochnan, ?] or this is [? Fermin. ?] It can’t do that And face detection API does not have cloud functionality, so all the processing happens on the device So it recognizes not only faces that are fully turned to the camera Faces can be positioned at different angles and the API will still recognize that It also recognizes specific landmarks, such as the eyes and the nose And it does that by assigning– the API assigns a number to each of those landmarks Landmarks are like specific objects on the face So if you run this, it might know, OK, a nose is number three, and then if you query the result return from that, it just gives you the number corresponding to the landmark So it detects the left and right ear, left and right eye, as we can see in the image It detects the left, right, and bottom mouth It can also detect expressions So it’s smart enough to know whether someone is smiling or frowning, or the degree to which your eyes are open So you can tell whether your eye is open and whether your eye is closed So if you have ever wondered how apps like Snapchat are built, this was built using the API So since it recognizes landmarks, it does do a few calculations to detect where my head is, and then position the flower crowns based on that And then, yeah, the API works on skin colors Naturally, I would care, because we’ve had cases of racial bias in APIs like this And at least I tried it on them, and it worked on all four of us That’s [? Fermin, ?] me– OK everybody is in GDG in that picture except for me Yeah So next, we’ll talk about the barcode scanning API And this detects barcodes in images It works on the device, too There’s no cloud API for this You can pass different things– driver’s licenses, allow people to connect to your Wi-Fi, or pass business cards with the API So I randomly saw this badge on Twitter, and I was curious to know what was in it So I used it as an example And I was able to get that name below That was the information in the badge So it detects 1D barcodes Those are numbers embedded in barcodes usually depicted by lines– black lines in varying lengths with white spaces It also detects 2D barcodes, so QR codes You can use it for QR codes, and they look like squares, contain many individual dots, and can hold a wider range of information than the 1D barcode can hold And just like the face detection API, it can detect multiple barcodes in one image It can also detect them when they’re upside down The image label API– that’s the newest one that is not part of the Mobile Vision API, but now part of the ML Kit It detects objects based on the images found in them But the interesting thing about this API is that it not just gives you the images It also has context about the image So this was taken at Lekki Conservation Centre, and if you can see the screen– I’m not sure everyone can– you can see that it detects a chair, and it knows that it’s about 0.89% confident of that result. It knows there’s a bench there, and about 0.86 That’s pretty high It also knows that that’s a spot visited for leisure And it’s also pretty confident about it– 0.8 is pretty high, so it’s pretty confident about that, too Image labeling is available both on the device and on cloud Recognition on the device is free and has about 400 labels, while recognition done on the cloud has a higher accuracy and has over 1,000 labels The image label API supports different groups of labels It can recognize people– babies, girls, boys It can recognize activities– dancing, running It can recognize vehicles, animals, plants, and even places And then, each result returns the confidence level So if it’s like, I’m very certain this is a house, so I’m going to return a confidence level of 0.99

Then, you are very certain that the item in the picture is indeed a house Lastly, we have the landmark recognition API And this recognizes well-known landmarks in images, but it also works on the cloud There are 1,000 free uses available per month This only works on the cloud It’s not that available on-device There are 1,000 uses available per month And apart from the name of the API, we also get the entity ID, and the coordinates of that landmark So fun fact, if you tried this, like, if you upload the picture of maybe, like, the San Francisco Golden Gate Bridge, it gives you the results, gives you the coordinates of that bridge, and then gives you the confidence level But of course, I always want to bring things to home, now So I started taking pictures of popular places I took pictures of popular in Nigeria from Google and then started passing it in And then, image one, two, three, up to 10– no result Finally, I got this image recognized That’s the Zuma Rock in Abuja So we’ve talked about what these APIs can do And very quickly, I’m going to walk us through steps of how to get started using the APIs in your apps First step is to connect the app to Firebase This is important, regardless of whether it’s an on-device API or the cloud API you are using And then, we need to add the dependency to Gradle file ML vision dependency If you’re using the image label on the API, you need an additional dependency to add to your view.Gradle file So that was for both cloud and on-device This section is only useful for on-device APIs, if you’re using only that You need to add these lines– this metadata to your manifest file This is because when the users install the app, there’s an option to download all the base API models before they are actually needed So that even if the user is offline, they can still use the ML Kit functionality in the app But if you don’t do that, then it’s going to try to connect at the point where it is used And this might not provide a good experience if your users happen to be offline at the moment That’s your [? lead-in ?] for on-device So if you are using the cloud APIs, there are other things to consider So the news is, only Blaze-level projects on Firebase can use the Cloud Vision APIs What that means is that, if you’re using the cloud APIs, you need to input your credit card details on Firebase, because there’s a possibility you might use past your free quota and get charged, and you need to have your payment details there before that happens So after upgrading to the Blaze plan, we enable the Cloud Vision API on Firebase We have to do that online and search for the API library So now that we know the things to look out for when using on-device and cloud-based APIs, let’s get to actually building an app So while playing around, I found out that, each of the APIs, though they are different, they follow a specific pattern So once you can follow that for any of the base APIs you need to use, then you’re ready to go I’m going to talk about the face detection APIs only here, though, because we have five APIs and we can’t cover that in this time So the first thing to do for both on-device and cloud is to expose functionality The best APIs expose functionality to us with detectors both for on-device and cloud This can then be configured on all the options the API provides to give us more specific results So let’s walk through the process of doing that now Here, we have the FirebaseVision face detector options And we’re going to populate this So here, we set the mode type This will indicate whether we want accurate or fast mode Fast mode is usually fast, but the accuracy is less The accurate mode is slower, but returns a more detailed result So we talked about the landmarks on the faces, right? This is where you configure that So you can detect all the landmarks, or you can detect a specific landmark, and you have options for that But of course, if you’re detecting all the landmarks, know that this is going to be slower than detecting just one of them And then, you can set classifications That’s whether the eyes open or not, whether the face is smiling or not You can set the minimum face size So you told the API, give me faces that are larger than this size You can do that You can also track faces So if you know apps like Snapchat that have stickers following you around when you move your face on the screen, that is also possible with the ML Kit

And then, after specifying all these options, we call the build method, and our detector is ready for use Next, we run the detector And the options are configured in the previous slide This is where we use them So we get an instance of the FirebaseVision plus, and then add the options to it get the Vision face detector method So here, for this talk, I will assume we are dealing with static images and not images that move around in the camera So we’ll call the FirebaseVisionImage, and this API allows us to pass images to it in different formats Here, we choose the bitmap option, and this returns a FirebaseVisionImage object to us Next, we call the detect image method, and passing our FirebaseVisionImage in front We also call the success and failure listeners, and if it’s successful, it gets caught You get results from success listener, if not, you get from the failure listener And then, we retrieve the information And here, this is where we make use of all the results So if the result of detection is successful, we retrieve the information So in this case, we retrieve a collection of faces And then, we can parse through each face, and then perform whatever actions we want on it So let’s say you had a picture that had, like, three faces Then, you are going to have three items in this collection And then, this is how to get specific methods So you can do face.smiling probability, face.getLandmark, face.RightEyeOpen probability This is how you use the data from each face We can also specify what we want to happen if there is an error And then, you do something with the error messages, or something And then, with these basic steps, we have detected a face with the ML Kit and retrieved data from it But while the detectors for the other APIs are quite different, but the entire process of getting information is practically still the same And this applies to both cloud and on-device APIs Finally, we are going to explore the use of a custom model with ML Kit So this is for advanced ML developers who come and say, I want to have a smart app, yes But I want to do more than detect faces, or detect text, or detect barcodes or landmarks So this section is for you So like I mentioned before, the ML Kit provides advanced ML developers an opportunity to use more than the pre-built models But this works, too, if you are not a machine learning expert If you are not an ML expert, or you happen to have a custom model that you want to use, you can still use that, because we are expecting that you might not be the one who developed that model So the ML Kit works only with TensorFlow Lite models And TensorFlow is an open-source library created by Google And it’s used for machine learning applications, such as neural networks But it is not optimized for mobile devices TensorFlow Lite, however, is TensorFlow’s lightweight solution for mobile and embedded devices It enables on-device machine learning inference with low latency It has a small binary size, because it’s supposed to work on mobile devices It also supports hardware acceleration with the Android neural networks API But note that you don’t have to know anything about TensorFlow or TensorFlow Lite in order to include them in our application, because the ML Kit abstracts all of that So even if you want to use custom models and you don’t know how to use TensorFlow, or you don’t know how to use TensorFlow Lite, then you are still good So let’s walk through the process of including that in our app And then, we can have the model classify images for us And for this section, we are going to use an already existing model called the mobile nets Mobile nets are small models that were designed to meet a number of use cases But for this case, we are using it for image classification And then, just as we saw with the base APIs, there are a couple of steps we need to follow Our first step is to connect to Firebase, as usual Then, we add the dependency to build a Gradle file, as usual Then, the important step is that we need to convert the TensorFlow model into TensorFlow Lite format And this is done using the TensorFlow Lite optimizing converter I know that’s a handful to say But you only need to convert if you have a TensorFlow model that’s not in the TensorFlow Lite format, because only models with TensorFlow Lite formats

run with the ML Kit So there are a couple of things to keep in mind when hosting custom models on Firebase if you choose to do that, and we’ll keep that in mind First, we need to specify that we have an internet connection, since of course it’s going to need to make a network request before that can happen And then, if you are familiar with the Firebase dashboard, this is what it looks like if you go to And then, to the left there’s the ML Kit option And then, if you go to the top, you find options to upload your model So once you do that, it will appear on the screen You’ll see the screen that’s currently being displayed And you can enter a name for your model And the name you use for your model online is what you saw identified in the app Now, we’ll walk through things to note when hosting on-device So for this, we need to copy the TensorFlow Lite model with the TF Lite extension and put that in our assets folder in Android Studio That’s in Android Studio And then, the file would be included in the app package, and available to the ML Kit as an asset We also need to, if you’re using an app with mobile net, I think you also need to have a label list So what that means is, basically, like, you’ve customized your model to recognize certain things, but in it we need the list of things it needs to recognize So say you’ve trained your model to recognize a cat and a dog In your label list, you need to have two items there– a cat and a dog And then, when you get your results, all you get is the degree of confidence in the image that it’s either a cat or dog that’s there So if a cat is not there, you get data back, but it’s a zero And then, if it’s actually there, then you get, if you go depending on how confident it is that that object is actually in that image And then, we need to add this line to view the Gradle file, because the custom model would be memory maps, and we need to specify that it shouldn’t be compressed So quick implementation When the custom model is hosted on Firebase, we can set conditions for when it will be downloaded or uploaded And if we’re working with models hosted on Firebase, we create a Fire Cloud model source object And then, we set the name Remember the name was set on Firebase when we uploaded it We’ll set the same name here, too And then, we’ll specify download conditions We’ll specify initial download conditions and updates conditions For models hosted on the device, we create a Firebase local model source object, and set it to either file path or asset path, depending on whether it’s bundled with our app Or you can also choose to, when your device is online, download the model from the server and then save it in your app, so that would be the file path But if you manually added it to an asset folder, then it would be in the asset path with the Firebase model manager to register the sources we create Then, this is done for either one of them, depending on the use case So you can register local source, and register the cloud, so then register both of them if you want Then, we have a Firebase model options objects– these are very long names– with the names of the model hosted on cloud or both And then, use it to get an instance of the Firebase model interpreter If you specify a cloud model, or a local model source, rather, the model interpreter would– so if you specify both of them, if the device is online, then it will use the cloud one And then if it’s offline, it uses the local one We also need to set the format of the outputs and inputs in an array, and include that in the Firebase model input output options object So we have a custom model, right? And we need to pass the time to it But the data needs to be of a specific format In this case, we’re using mobile net, which deals with images We are sending in an image with dimensions 224, 224 And then, it’s an array of one That explains the one at the beginning Model dimensions 224, 224, and then it has the RGB format That explains the three at the end And then, we also specify the output conditions

So we’re also returning an array of one, and then we have the label list size So it will return the label list size– remember, the label list that contains all the different items that the model can recognize So we take that value stand and get the size, and also return that, too So in this case, we are working with only one image at a time And we need a way to tell that to the ML Kit And this image will be converted to a ByteBuffer, and then sent in through the add method of the Firebase model inputs builder Finally, we can run the interpreter after setting the inputs and the outputs and the output format options The results are returned through the success or failure listener And this returns an array of probabilities as our inputs, based on the label list we entered This result can be mapped to the labels specified so that we know what probability each label has for each input So if you have a map of, maybe, cats and dogs, you get a result of arrays, probably 0.4 and zero So you get 0.4, that means it looks like there was something that resembled the cats in this picture, but I’m not really sure And then, for the dog, if you get zero, it means that it doesn’t feel there was a dog in that picture So finally, we run the interpreter after setting all of that And we get our results So in this case, I used a crossword puzzle as my input data And then, I got back the results And I got back a list of– so for the mobile net, there was a list of 1,000 items, and I got back a list of 1,000 probabilities But what you do in this case, is just to– after mapping them, you find a way to map them to the original list And then, of course, crossword puzzle is 1.0 You only get 1.0 probability when you are, like, very, really sure that that’s what in the image And it’s also accurate because the other items in the list where we return the probabilities was zero So yeah, the summary of all I’ve been saying for 40 minutes is, ML Kit makes it really easy for Android developers who are either beginner ML Kit developers or really advanced ML developers to build smart apps If there’s one thing you take away from this session, then this should be it So I didn’t come up with this alone I used a couple of resources You should check out the talk by Yufeng Guo at Cloud Next ’17 about TensorFlow I watched that There was also the Google Codelabs on TensorFlow, TensorFlow Lite, and the ML Kit There was the ML Kit official documentation, which I had to read, like, three times to get it to work And then, I used a couple of sketches in the slides, and I got that for free from that website Thank you very much [APPLAUSE] Do we have time for questions, or no? Do we have time for questions? Yes? OK Please, pass the mics to them Please raise up your hand if you do AUDIENCE: Hi, my name is [? Tyo. ?] I want to ask where can we get models, like custom models, online for ML Kit? Because I feel that there are a lot of models that we can use to make our apps smarter MOYINOLUWA ADEYEMI: OK So the thing is, I don’t know if there is a site that has, like, OK, here’s a list of all the custom models you have AUDIENCE: Like maybe a link online, like GitHub for models, something like that? MOYINOLUWA ADEYEMI: OK, so I don’t know if there is a link to free You can do a check, but I don’t know if there’s a link to free models But I know that there’s a– I can’t remember the name right now There’s one that recognizes digits, numbers I can’t remember that But I’m sure if you do a search on that, you would find some information

AUDIENCE: And is the learning curve for learning how to use TensorFlow Lite– is the learning curve steep? MOYINOLUWA ADEYEMI: Well, it’s a bit wonky, because I’m not sure I know how to fully use TensorFlow Lite yet I know how to input the details of Lite model into ML Kit, and then use Kotlin and my Android dev knowledge to build smart apps, which is really the point of the ML Kit, because it’s suddenly you see you really don’t need to go through all of that But the way I look at it is that, well, Android developers might not be required to build the models So the way you have an app, and then someone gives the APIs It might the way you have your app, and then someone gives you model, and you just need to ensure that it’s in the TensorFlow Lite model But it’s also good if you can learn how to build it yourself AUDIENCE: OK, awesome Thank you MOYINOLUWA ADEYEMI: OK AUDIENCE: Hello, my name is [? Ebi ?] [? Kuliadualua. ?] In the demo code which you displayed to us, I noticed you used the hardcoded values for the dimensions of the image MOYINOLUWA ADEYEMI: Well, that’s because I know I’m working with mobile nets, and I’m working with an image, and I know the kind of image I’m passing it So it depends For some, you might require– the values vary depending on what kind of model you’re using AUDIENCE: OK Let’s assume I’m trying to send a model to the machine learning I’m trying to send a batch image to the model MOYINOLUWA ADEYEMI: A branch image? AUDIENCE: Like, a batch of images MOYINOLUWA ADEYEMI: No, this process is only one at a time I tried it I actually tried it, and I got an error AUDIENCE: Yeah Thank you MOYINOLUWA ADEYEMI: But you can build your own model that accepts a batch of images, so it doesn’t have to be this And I think that’s the advantage of the custom model, like, you can do whatever you want if you know how to build the model AUDIENCE: Hello My name is [? Aquem. ?] And I just want to know, how heavy is the library when you [? dimension ?] [? and enter it out? ?] MOYINOLUWA ADEYEMI: OK, so I didn’t measure it, so I can’t really say But the thing is that it’s going to be really smaller than inputting your whole model into the app by yourself It’s going to be smaller than that And then, since it’s specific– it’s not like you are getting the whole of the Firebase API It’s specific Firebase dash model dash ML I can remember the entire name But I don’t think it should be too large I didn’t check it, so don’t quote me anywhere But I don’t think it should be too large All right, it looks like that’s all the questions I can take Thank you very much [APPLAUSE]