Computer Vision Tutorial | Image Processing | Convolution Neural Network | Great Learning

the short definition of computer vision is when a computer or a machine has sight to get a little more technical computer vision is the process of recording and playing back light fragments the importance of computer vision isn’t the problems it can solve it is one of the main technologies that enables the digital world to interact with the physical world keeping in mind the importance of computer vision we have come up with this computer vision tutorial now before we start the session I’d like to inform you folks that we have launched a completely free platform called as great Learning Academy where we have access to free courses such as a iCloud and digital marketing so let’s get started that we talked about last week was how exactly sampling takes place here yeah so if there is an image I hope everybody is very clear if I say image is 16 megapixel or if I say image is 2 megapixel or if I say image is 64 bits image for example or 644 grids 64 bits whatever what does it mean is here there are more pixels it means them array that is holding this is pretty big it is of size 16 mega and here the size is very less so there is a problem in your current project what is the size of the image the size of the images 32 plus 32 or whatever do you have taken one big 30 de Grasse 32 is the size so 32 into 32 is what 1 1 0 2 4 yeah so it is okay for me to build a neural network which can take 1 0 to 4 neurons as an input and which can you know do up and down all these things and finally give me a classifier as a no it’s okay not a problem good but what if now this was 16 mega so when I multiply these two it became 16 mega now is it possible for you guys to handle this yes or no can I have a 16 meter input neural network it will be a memory problem it will be a computation problem cost factor you know lot of things comes into picture so in this case what do we do is we don’t go ahead with simple MLP or fcnl phase what is MLP correct multi linear multi sorry multi-layer perceptron okay which is nothing but another name of FCN in itself okay so please don’t get confused I have seen many many people they say MLP a an FN and all they mean the same thing one cept Ron is nothing but in Iran whenever an Iran goes into hidden layer that becomes a percept what knows why but this is what it is okay now this becomes very difficult to compute so now what are we doing is what if I can we just saw we had a Python code beautiful Python code we we are writing it is working good what we did we went on to a new software or especially me went on to a new software and we got a diminished version of it which serves which does the exact job as what it does I will say not as effective as this but he has at least I can say if we knew five minutes you able to achieve 70% of what it does I am very helpful why should I code I should directly do should I waste my time on code agreed everybody so same thing you can imagine about images what if I don’t want to scan the whole image so let us say this is an image standing in the image okay ignore my drawing yeah yeah let us say this is a person and there is a flower pot over here and say there is one picture in the picture again there is an image here something like that let us say we have this so if I ask you guys what is the information that you guys can derive out of this you will definitely tell me that there are three spots of information first this there is an object which is a frame there is an object which is a flower there is an object let us say which is a port there is an object which is a face that’s it you vendor order let us say you have got four so if you tagged this up if you remember what do I mean whatever what do I mean by tagging your current project has got taggings

delete here also we have taggings I will say object I will say person I will say object object okay now there is chances if you designed very nice if you detect up also naught of this yeah so if I want to do this I do not have to scan these other empty parts I hope you people are getting it I don’t need to scan this what if I design small filters okay what are these filters these filters are some random filters like your random weights that you have chosen we are seen usually let us say these are some weights some filters which when you multiply with the whole image we are able to get a diminished version of this whole thing so there is a possibility that what if I multiply multiply this filter do this okay and I will get one I will get lot of diminished feature Maps like this I can call this as a feature I can call this as a feature feature feature now what if I use this as one image now this could be around say four cross for image for example what if I could use this in to my fully connected noodle to do a classification what if I could use this separately this separately this separately so I need to run my fully connected neural network one two three four five times to identify these five objects agreed so rather than running neural networks megapixel times I’ll try to find out some features so that I will be able to classify them yes question of clear effect so this is what is your neural network sorry convolution neural network and this is what we are going to do down the line so any convolution neural network so many where I say filter you people should completely remember understand that we are talking about convolution was this there are a lot of things I will show you down the line and always a convolutional neural network is backed up by a fully connected neural net so this is not leaving us anyway so whatever you guys have done in the first session our first module will remain with you guys through this complete if convolution model perfect so now moving on what are the other things I was talking about so I will just directly at particular page okay and let let me be fair now let me not be rushed because we have got good amount of time today so these are my filters which I am giving it to my image yeah so my white portion I have named it as minus one and colored portion where one is written I have put it one when I multiply I am going to get a diminished version of this if you want to see whatever I mean by diminished version this 9 3 cross 3 which is 9 numbers I have diminished to 1 number called 0.33 if you still want to understand how it does it I will show you one small gift which does it just observe the gif on the right hand side a 5 cross 5 matrix is now diminished to 3 cross 3 matrix with 1 convolution everybody is okay with this how we got this one number I will show it’s a formula very simple formula manual multiply multiply kitchen is needed here okay so yeah so this is what we have done good now when we did this we got three matrices like this okay post that this matrix whatever we got we will transfer it to an activation function say raloo now as soon as you pass through you people know that zeros and anything below zero will be omitted remaining whatever is left will be processed over here so now I have got this as an output and whatever I have done that means it will be moved out of this matrix negative numbers moving on there is a technique called pulley max pooling or average pulling the right I stand still not happy with this I want to still diminish this image so this is now I will say 7 cross 7 image now I would say the size of my filter was 2 cross 2 so in this case I will do a max pooling on this I would say find out the maximum features out of this for maximum number out of this form obviously it will be 1 do it this is called max pooling so though hold 2 cross 2 matrix now is represented by 1 Cross 1 that means I am dividing the whole matrix by for a BT case symbol instead of max pooling you can even do average fully average pudding means take an average of all four of them put it over here alright so do it pass this filter to each and every

portion of this matrix into it good so this is what it is so is it always a two by two that we have to take it can be more than that we have to define final max pooling thing but usually we take two by two because it is easy to decide no and let us say if we take three by three it will be this divided by nine so there is a possibility that we might lose lot of pixels or we do agree we have lost some pixels here information okay now why is this so why am i happy with this one question to all of you so if you people got this pooling part of it quick question what is the use of doing like we are already happy we have already like diminish the matrix you’re happy with it now our neural network and take it still why are we doing this sorry yeah correct correct but why still why I want more of doing to prevent overfitting kind of see okay I’ll give you one very simple example say I have currently I have five batches underneath okay all are my I am in batches only now from this five match if I want to let us say there are four candidates who are the top performers for example that is a candidate number eight and it number B candidate number C can somebody from all the five out there let us see something now if I want to represent how a batch looks like on an average or what is my strength of my batch can I say I will pull out the best out of these four max pooling or I will take an average out of these four and put correct yeah so what does it mean is this particular four things when they come together they kind of represent a single religion so it could be let us say an eyebrow of a person an eyebrow of a person like this okay where where if my scanner has gone onto it or if my say this is my choose one light color okay say this is my convolution say for example we are talking about see this is your face okay imagine this is the face and say these are my eyebrows onto that okay now what am i doing this let us say my scanner is currently hovering over one of the eyebrows something like this yeah now what if I am doing max pulling onto this what does max put in do we just saw it will take the best feature out of this do I need all the features here or do I need the best feature out of us can I say here the best feature is nothing but the black portion here let us stick and pull it out saying that when I stand through this I find out this particular location the color of this image is black very similar note it down so in future when you are giving a new image when this is going through a newer image at this particular location if max pooling pulls out flag that means is image and this is perfect you see let us say this person and look scary let us say this person has got one what you call mole on his face and then in future you compare it at this particular location definitely this mole is going to be there so it is going to understand that yes it is a same image I am very highly will be when I’m still like we have not died of deep intruder just trying to understand you guys that sometimes what happens is when you try to think that okay we are removing the pixels not actually we are taking the best features out of these pixels take it that way all right any questions on to this conditional we might lose the information right and in images mostly it’s like black color is more dominant in anyways right if there is anything in that is having less information but also significant information we might lose it right yes kind of yes so that’s the reason what we do is we strike this we we make sure that our filter that we have designed here it goes through let me change the slide yeah I will make sure that the filter that I’ve designed first of all I’ll make sure that I have done a good sampling I have not done a waste sample first of all and also I will make sure that my filter goes through almost everything I don’t lose any of this my filter gets multiplied with everything off ok so you are kind of right yes when we try to diminish something it happens but also another

side when we over sample it other and we do a good sampling we will not be losing any pixel in this case all right see right what do I mean by sampling I will give you a very good example let me open up an image if we have image yeah so there are chances that when I do some type of sampling one pixel could look like this and also there are chances that when I do very good sampling one pixel might look like this also if I try to do more sampling one pixel could look like as small as this dot so if I have good samples like this more samples it will be easy for me to remove certain part of it it’s okay because if you observe here the same color propagates almost through a complete top part of the same agreed yep so if I remove five to ten of them pixels yes yeah so I guys be very frank on do this are you people able to visualize or not because if we say like if we don’t question right now and down the line in second and third week we are not going to go back onto this he’s very clear on to that whatever I’m saying are you guys able to understand I mean I just have one question Krishna how about the samples is it like because we might have like one lakh of samples provided to us in the data however it might be an individual image which might not give the correct representation of that data could be like knows is coming on the left or it should be on the right in order for augmentation and flipping and all those things so how do we ensure that the samples are right in place as required for our analysis because we cannot check all the 1 lakh images if we are provided with that right correct correct so for that we will will see something called metric learning in in in one of the weeks I think it’s almost last week of this model well we will try to find out a distance between the two images how far they are similar images but how far they are from each other okay a very good question but let’s let’s hold on yes sure but I am clear on this perfect because this is one topic over here we already of my batches know they they don’t get it that how by filtering or multiplying a filter we are able to get an image okay good second application I will show why CNN so if somebody will ask you why exactly see a very simple example is whenever I show you let us say if you look at a car if you look at a car and let us say the car has the front part of the car is like this one minute is really sorry yeah say the front part of the car has got two openings like this yeah and definitely we know what car does let us say this is a BMW for example I just imagine this BMW yeah yeah so do you go and check the car let me check the wiper let me check the steering let me check the hold let me check the seats no visually if you want to identify something immediately from the bonnet down on the bonnet when you look at this filters here good to know okay this is nothing but huh what do you say BMW also if you look at the or departmental right in the filter you might find the format’s so definitely this is enough for me to identify image same concept you bring it on to computer vision you don’t need to scan the whole image if in case you want a generic scanning happen let us say I’ll give you a very good example I am currently working on this project where this is for my gated community that where I currently live so we have got an app called my gate and what does this my gate do is it helps us to track the vehicle so somebody is coming in our community it has to report and there is a notification sent to the owner that somebody is come to meet you do you allow one we say yes so everything is manual the watch security guard has to get the name of the vehicle type of the vehicle name of the post load of things now what we have done is sometimes also we get unnoticed because the person who’s talking from outside doesn’t know he directly walks into it and security assume that okay either part of the commitment so sometimes they go unnoticed so what currently I am working now owners let’s install two cameras here okay which will try to snap a picture

out of this incoming cars and as soon as I take a snap a picture they should be able to locate where the number plate is and if you remember your current project that you are doing we can easily identify the numbers out of it and from my central database we get to know whether this is this car belongs to the community or is not in the company if it is not in the community the gate let us say over here should not that could be an automated gate which will open or closed because of this triggers for example okay so so far I am able to identify so if I take a webcam and run around through the cars yes I am able to get the number plates I will show you by shortly once you guys understand what talking about so probably second or third week of the session I will show you this where we’ll take a webcam and we’ll try to identify objects out of them okay good so this is what it is so what I have what I wanted to show you guys here is sometimes in computer vision you don’t have to identify the whole image you just have to focus on certain part of the image that’s all if you get it do it and how do we do it CNN will help us to do it in CNN we don’t scan the whole thing we don’t need the whole thing as an input we scan certain parts of it and we take it as an important sometimes some of the parts are perfect image that we want good so this is one example of CNN I will give you one more example of where this is used nowadays if you go to these metro cities there are malls in Moore there are parking lots and now right in front of the gate itself the display how many slots are empty on which floor which level and what size good earlier what days to do is to put sensors over here so if there is a weight on the thing it is it is car there is a car if not is no actually the sensors are very costly and during maintenance and all they get destroyed they have to embed it’s actually a very sophistication man has like has to be followed of the so now what they do is very simple they put a camera if they find some other color over here that means there is a car present the week the slot is not empty a counter will be zero what is this this is called semantic segmentation this is your v3 content that we are going to do we don’t care what it is we just see that there is one background color and there is some different color on top of that then over this is occupied I don’t need to know what car it is what color it is what shape sizes that’s not our application good so this is where convolution networks good now coming on to the weekly content so this was a little extra oh so all we do is we take the image we go through convolution which was LOV go through pulling again we go through convolution rail loop pulling if you still want more you go through normalization normalization normalization normalize your wish after this what did we get there are two layers of convolution I did what did we get a bigger was converted into smaller now next what to do next flatten them up and push them on the fully connected neural network that’s all is your CNN this is your flattened data and you’ll ignore this point six six means point five five so it’s an excellent screen shot so it did not I did not notice similar just over here okay so one point five five point forward they are flattened completely and that’s not anything goes here and you know what happens here and how we classified me that’s that’s always your convolution and when you bundle all of this together this is how it looks linked image convolution pooling convolution flattening and you’re fully connected Newman or I will be I have a question how many convolutions and pooling should wait to start to improve flattening yes so there is a formula we are going to see down the line which will help us to do this but frankly speaking I will tell you what you have to do is you have to determine what should be your input here first of all and from there we will go down the line okay so if you want say 64 that means your flat end should be 64 and if let us say you are having say ten features over here let us say ten windows over here now each window should be what if you divide 64 by ten you get six point four which is not a good life so what in that case what you do is you make it eight for example so eight it done 64 so it could be to cross cross for each one or you can say to cross too many number of

times I think to cross 204 for too late adding eight number of times I’ll show down the line how to do that hey Krishna can you repeat it I don’t think I got that the what do i what what was the question here was how many times I should do this plus what should be the size of me Shri Rama all those stuff so it completely depends on my input first of all so what you are planning here so let us say you are planning any foot of 64 right if you are flattened data is 64 what what what how many filters I should have this is what that question is yeah so how do we device this now so take the reverse engineering concept let us say if my each filter is 2 cross 2 in nature so to cross streets how many numbers I have 4 how many more I need 16 for 64 right no yes or no yeah 16 for 64 in that case I need 16 of these you cross – okay yeah so now go reverse if I need 16 of this how much should be met devious like someone should be memory miss value will give mix finding as I show down the line but this is what major leave me to it also one hint for all of you is very less models we are going to use this we are going to use ready-made models vgg Alex net Google net mobile net rest net RNN sorry not RN n r CN n we have ready-made models huge big models which which can handle values up to 6 million parameters very few places where we are going to do this okay because now no we don’t want to take burden off something like about KN e ni any line tool today you already have ready-made models by some of our famous companies why not to import it and use it okay good perfect so overall the whole module what you will do you will you will be given an image you should be able to identify first of all the image you should be able to detect the image first of all and most detection you should be able to identify it what else it can be a car though you and me cat object anything which I we should be able to do it so just already say overview to all of you this session or this module will be around seven to eight weeks to be very frank it has got two projects into it back to back the first project will come after your fourth week where you have to detect an object you have to create a boundary around the object in the second project is you have to check whatever you just created boundary around what it is what type of object it is good so by the end of eight weeks I will say or as two months you should be able to be good at computer vision you should be able to at least implement basic level of from scratch plus something extra from my side I will show you simple versions of this I don’t want you guys to go and code huge it’s good nothing wrong you do it but also what we do on industry side I will show you smaller codes we use ready-made models to do plus I will show you something extra which is not covered here which is called video continuously major CIA yeah so that could be a very good application especially if you are from manufacturing industry building industry or I will say majorly want to construction industry some some last time I give you some examples from my side onto testing industry so still we are exploring how to use this opportunity okay good especially if you are from retail and manufacturing this will help your walk and tell differently I currently interviewed for one of the startups who where – I will say manufacturing and they are they were from a QA team so they’ve got a very big project from one of these large automobile manufacturers where they want to pinpoint from there what do you say a manufacturing line which product is defective just by looking at the image from the camera so let us say there could be a lot of high definition cameras attached earlier what these days is to do it is to pass this stuff from you ease or they will pass a laser like to check the deformities but now looking at the image itself they want to predict okay a very sophisticated portion of this good so day by day this stuff is going on it’s evolving so by let us say I’ll tell you one more thing the timeline for computer vision is at least we take for you wise six months to get normal to this and start applying real time so by

then you can say this will be a learning curve so please be very patient with it it is not a simple concept okay what is it we discussed why we discussed because of image size cost and all we are doing this more or less this is what we saw in my hand at an image with diagram image you take a feature out of this convoluted flatten it up put it onto a fully connected Network and get the prediction so if you observe here we have a prediction called car crash converge yeah good all right how how I just explained you how we do it so please remember one thing this is your static layer forward propagation layer this is your forward and backward propagation there for learning all right so this is a classification layer this is your feature exactly where this processing is used as I said lot of possibilities one as I said put me on to something like object detection in any industry is thought something like comparison so nowadays our government is using this to compare images find out a lot of things so this is usually used by Interpol FBI wrong you have seen in movies also where they compare two images and try to find out what is the best match third application could be into retail industry where then let us say there is a product which is displayed on the website and there are lot of attributes attached to it let us say there is a phone so 2gb RAM this is that there are certain attributes which need to come from the technical person but there are certain Atlee which you don’t need to manually type like color shape size what it is which company it is belongs to lot of things so those things which can be easily scanned and put they are automated some of my friends working with Amazon flip card reliance friends and all this that’s what they do they are hardcore and do computer vision onto this clothes what is objects so whatever they see they put it so for example there is a t-shirt which is displayed let us say something like that they can find out it’s a v-neck t-shirt is this color this this could be a problem size yeah so some application the final one I will say is on to reinforcement learning Google car Google glasses yes so it’s a combination of computer vision plus reinforcement learning what is reinforcement learning it takes an image and learns it it does not have to do all this if it is not in the corpus it should not reject it it should try to understand what it is good ok so these are the applications any questions on to this anybody wants to know anything more on the applications specific to your industry where you can use it do let me know so Christian have you come across any such usage in terms of HR services or payroll services industry yes yes yes yes I will not say purely payroll but yeah one of my seniors has walked onto one domain I’m not sure how relevant it is with your so let us say there are a lot of invoices process yeah so from these invoices they try to pull out the numbers they want maybe you can do something on to your it is so we have some kind of requirement right now in our company but I don’t think it is fully of the capability it goes like this from thee so we get new clients right when you when it comes to implementation rather than going to clients asking them questions about how the current system is designed and what components pedal components do they have we actually scan the employee sample piece lives and from that we could determine that these are the components that are there and how they get calculated the formula and everything gets derived out of that it could it was a failure earlier but if there is something that you have seen from your market industry or some one of your friends you could help you one of these where we try to pull out certain tags or

requirement sort of a document so it is something similar to your thing you can say just give me a minute I not show you the whole case study because it’s it’s a very it’s a part of your next to next sorry next module anyway but to give me one minute 10 and LP I just finish this code so good time to see you observe here I am able to tag certain words all of this yeah so you can in your application I would say you can use party to identify the text partly as a text processing and wherever you find a formula or a process or something which has what functions alone you can use NLP to find out which which is the closest it belongs to so for that you need to have a very good corpus our corpus is a back-end data you need to have all possible formulas I hope that’s what I’m talking in what I was trying to under standards is there any way where we can have a mixed kind of database having images as well as the raw data that we get where which we were looking into machine learning databases know so you need to have two separate purposes one is for definite if you want to work on this do me a favor give me some sample data I do not expect the whole data from your organization give me some fake data which is in sync with that I have done something want to text purely but yes you can include images and we can combine both of them so what I do currently is I have done one of these implementations where we scan rfp’s yeah and we try to pull out requirements from RFP using image processing so we can say sorry using text processing you can say which requirement is a function of requirement which requirement is a non-functional requirement which is a business requirement under functional requirement what possible they are talking about other talking about automation or the functional are they talking about unit whatever it is you know it’s very easy to pull this stuff out the only thing is luckily I have got a good course for this yeah so I can do that all right this is one thing anyone else everything is good then yeah all right so let’s see what we can what what is planned for this particular session is we will take an image and data structures and tides we see to it what are they convolution filters comments on an image come on you shouldn’t even it was funny connect fully connected neural network convolution layest average max pooling we see this and forward and back prop in Siena yeah good so I think you covered almost everything now let’s see type of images or what pixels can consist basically so we can have green or we can have an RGB our G B or grayscale now if I give you an example what do you mean by gracefulness in here the lowest possible value for pixel let us say is zero and the highest possible could be 255 for example zero could correspond to white 2/5 or zero could correspond to black 255 could correspond to pure white and whatever numbers you see in middle will be the different shades that you are seeing here good everybody gets this yeah okay so now moving on what about this one here also same stuff here will have three different channels one is called red one is called green one is called let me go you need scribbling red green and what are the numbers over here same thing zero could be darkest green 255 could be lightest yeah different shades of being even observe over here zero could be lowest blue 255 could be is 0 255 now you may ask me know if you find then how do I mixed match how do I get one color so if it was purely green we will pick it up from here but what if it is a combination of all three of them in that case there is a special function which will give us a calculation between all three and give us one them but which will correspond to one of the color matrices which is stored in our computer yeah so if you may remember there is a special type of plotting function called

I am sure yeah in that you put any matrix of your choice say create at 64 + 64 matrix of your choice no problem you put it and try to plot it using I am sure it will definitely give you some order other color so what does it mean is there are some predefined value stored in the computer which takes up these values okay so this is what is two things which we’ll deal with next moving on whatever images we can see so here what Rudy and TV mitosis what is the 2d matrix the 2d matrix is your n cross N and what is it 3d matrix so I will say grayscale image could be a 2d matrix and the RGB will be a 3d image so in that case I will say 32 cross 32 cross 3 that means it is a 3 channel 32 plus 32 image and if you want to make it 4d I will say if you remember in your model we have 42 thousand images 32 cross 30 in this is the 40 but don’t take it up as a 40 you know it’s we can we can ignore the first and the last part 32 cross 32 is what we are interested in yeah and please remember pixel is the lowest form in which an image could be represented it depends all on us how we are sampling it if you want more samples that means if you are image is 16 megapixels it has more information it is more sharp if the same image has 1 megapixel you can understand how blur and way it will be if you want to see how it looks like a very lowly sampled image look at this this is how it looks like each square over here represents one pixel I could have done it in a very simple way also each dot over here could represent one pixel in that case what would happen we could have easily been able to see his eyes nose mouth and beard look at the blurriness okay so if you look at this this is how an image looks like on the back end so this is your front end this is how we give numbers to this image where black is zero anything which towards why it is 255 now you can check it out look at this value and use it look at this value now perfectly sync and then you remove whatever errors and this is how it looks like on our pipe inside back in part off okay what if I want to normalize this I’m not happy with 0 to 255 what happens there is in that case what we do is we divide it with the highest Y what will be range what will be my range now can somebody spell it out if I divide or if I normalize this what should be mad at me between Dido and this is how a digital image is processed in the back in part of it now if you want to understand how a filter works very simple just look at one particular corner of this image is being scanned or multiplied by the filter this is my filter and when says the filtering finishes this is what we get out of it so a 3 cross 3 is becoming one number so I can say I have divided the whole this thing by nine basically totally 9 meet numbers are represented by 1 this is what was my convolution and how does it happen if you observe here very simple minus 1 into 3 plus 0 into 0 plus 1 into 1 like that you do it for all the rows and columns at the end we will end up with minus 3 some number what – if you ask me what is this number represent I would say it’s just a pattern so when I pick up one feature out of an image the answer for that pattern or that image – tomorrow when you get a similar you get almost all the numbers similar down the line to another image you can say that this image and this image is perfectly future ok moving on now there is a thin line of difference between convolution and correlation so just now again I will say filter is let us say some randomly chosen 3 cross 3 matrix here for example so what is the difference between correlation and convolution is in convolution what I do is it is in correlation also we multiplying convolution also you multiply only difference here is every time you want to use a filter no you flip them you flip them you don’t use

the same filter every time you keep flipping now what do I mean by this let me show you an example so let me take you to that place yeah so let us say this is one image given to us okay these are the filters that I am going to multiply this image with so look at how close look at this particular filter this is how it looks like Wow just see what I what I do so when I change sorry I’ll show you guys how do I play around with this number so what I’m doing is both my extreme vertical ends I’m making it minus 1 minus 1 minus 1 and whatever middle values are there and keeping it 1 1 look at the original image now let me run this at least I need to run this now let’s try to plot this up this is my original image I have changed my filters earlier if you saw there was a different filter it was vertical stuff now it is going to become did you observe this race yeah just by changing certain values in and out I changed my complete image how it is interpreted look at this yeah so just by changing this filters and I am converting this image into this image one more thing I’ll show you let us say I divide this by see fully for example some random let’s try to generate this see yeah and what if I remove this for example yeah the complete image is changed so please remember when a filter changes definitely the value of your pixels inside the matrix also changes and because of that you will get different images so this is how we play around with images and is that this is how what do you mean by filters okay good now let’s see this one did you people see the implementation in the weekly video onto this or you have to do it do you want me to do this image processing physics or we know this normally don’t know this let’s do C very simple thing we have seen this snapchat and what else we have we have got Instagram and Facebook lot of them now you know if you just look at it there will be glasses put on your face or there will be some cap put on your face or beard a lot of things are happening and also sometimes you can try to change the color of the image how do they do this very simple only they have to do is they were to design one filter like this that’s it okay so this is an example of a simple filter you can imagine they’re a little bit bigger filter such a way that they change certain bag around in a node and they try to understand it this is your face remaining things is not okay so if I multiply this image with this thing we are going to get this now what is the pattern here if I multiply this is I want to sharpen an image this is what is sharpening you are highlighting hidden features out of me this is what I call it sharpen if you wanted to edge detection this is how you are going to do it so you will see only edges over here okay where do you think we need this guys let’s try to think about it except this snapchat analyzing not those examples apart from that anywhere else you are able to visualize we need okay perfect whenever you want to augment an image in your corpus you can use this good one anything else

sending instance okay let us say we are talking about something about into medical industries for example in medical industry let us say you have a snapshot or a microscopic image of persons cells and they want to detect whether these are cancerous cells or normal cells in that case what we do is we pass it through one of these filters such a way that we get edges like this one from this edges me you know we might come to know how big is the size of could be one possibility and I’ve seen extensively taste care using this nowadays nowadays even idly sophisticated hospitals that we go to it can no need to show the x-ray to or the doctor machine itself tags let us say this is an extreme okay this is an x-ray of some part of the body heat basically it tags itself where is the problem it auto detects the problem how by this kind of machine learning or by this kind of computer vision okay also I will say not only computer vision you can use even our KN and SVM also over here 25 these things moving on Lenna this is a very famous model so when I was studying my engineering all of my textbook had this image image used years on your own years what is this take an image put a filter or do that what is this filter sharpening filter now is this filter in sync with our earlier filter look at this filter guys 0 minus 1 0 look at our earlier sharpening filter 0 minus 1 0 minus 1 Phi 1 okay so wherever you go this filter is going to be same now what if I want to sharpen my image more what is the possibility what you can change here anybody is able to beat the patterns good perfect so just understand the pattern you will get take this example edge detection now this has detection look at this and look at our edge detection here is it’s in sync not at all what does it mean is it’s not necessary always to follow the same yeah in this edge detection what we have done is we have highlighted the extreme edges with zeros and ones in middle we are changing it to – okay same thing if you want to more sharpen it what we are doing we are converting the complete third row to this zeros two ones and this to two here also zeros are connected to minus one minus one in this one is two and this is the middle of all zeros in that case you will get a sharpen image like this this is also sharp this is more shock I want to go one more level about I could have done – 2 4 – 2 0 0 0 – 4 it will give us a better one one version above the sharpest ok perfect so this is what majorly in and out we do want to image processing frittering now you may ask me ok fine why did we do this we did this because tomorrow you need to augment your image or tomorrow you want to you know basically compare two images and let us say one of the images in what you do is you apply a single filter on both of them so that you bring them to a single level and then you physically compare them to one or you use this formula to understand how far are the points okay good perfect so this was your image process you know let me give you an example on this let us say this is one image of me okay let us say this is one more image which I have taken so this was usually you know if you look at our ID cards I don’t know how many of you feel that I have a six-year-old ID card from gap Germany okay if you look at my image before six years and if you look at my image now this is something like before and after something has happened to me because of which this image is nothing with what I currently look like okay so let us say this is my only image and this is my current image so let us say this is something on my ID this is something which life my organization is taken now I want to say this is old and this is current and there could be one more image person who looks almost similar to me so I will say this is a positive image this is a negative image first

time in the sense it is kind of me negative means I know it is not me what if I take this three separately through a convolution neural network see what if I take this person through convolution your network I will get a set of matrix agreed I take this image I will get a set of matrix I take this image I will get a set of matrix yeah now what should I do to find out that this two are same where these two are not sleep I need some patterns right I need some something some function what according to you could be used here yeah but if you observe here these are nothing but numbers these are just number some matrices so what what algorithm what I should use ladies you remember key in it what was Cana what is tienen dude distance yeah what I find out the distance between these two matrices yeah what if I try to find out the distance between these two okay let us say that this is between these two is alpha and what if I try to find out the distance between these two let us say now tell me the relationship between alpha and beta so can I say alpha minus beta will be always less than less than less than equal to does it come to your okay fine let me not confuse you guys what I was trying to say is please remember at the end of the day these are nothing but numbers you can use any of your machine learning classification techniques to do this here what I did was here I use loss function as distance we can definitely find out the complete distance and we can say which two are nearer which to our father so if I say if our two images if we say two images are similar I will say let us say the similarity should be around 0.5 so when I subtracted them now let us say I got a value of 0.29 that means the difference between them is point two which is very less as compared to 0.5 I can definitely say that the distance is very less between them but let us say when I compare this way this I got a distance of point so it’s like it is above my threshold well so I can say that yes both of the images are different you can use even CNN into that level very simplistic way you don’t need to go and do all high-five stuff always this could be a symbolist this is called metric lani don’t see in it okay so this is where you will need all this image processing skills sometimes to alter the data I hope everybody is in sync with me yeah I met Shakti Akash through Harish Kashi Kranthi Raghu vishna very good yeah perfect so now let’s see this case study where I will say it’s not a majorly a case study but yeah some of the basic concepts which will help you to to work on this so what do I have I have got numpy I’ve got SK image okay which has a data and there are some images restored and also it will help us to do some kind of i/o functions okay next there are some priests predefined data sets which are stored inside Python so inside basically ASCII image one of them is coins data so I just have to call the data coins data and then push it on to image and find the type of it so usually it does a number array because if you remember I just said images are nothing but matrix of numbers okay and if you want to see how do they look like in the first of all if you want to see the shape and size of it 3 & 3 cross 3 84 is that image one image

this is how it looks like so if you want to play around with it if you want to see what is image 0 cross 0 has 47 numbers in total yeah so this is nothing but your 47 also if you want to print the top say print stuff from 100 to 150 on X side and from y side you print some from 25th to the 75th reading if you do that when you print it out you get an image like this which is one of these what is say a coins okay now see map defines what type of image it is if you want a grayscale image and put a grayscale here if you want red green blue you have to define reds greens and blues I have shown down the line always let’s do it one minute okay so this is my grayscale okay this is my image so let us say if we say 0 comma 1 0.08 3 0 comma 7 comma 1 for example 1:30 okay so what do we do now is let us say I want to show you a grayscale image I will say agree ok that case I will get a gray one if you want a green one greens will get a green one yeah so these are some manipulations on see map later on there is one more set of image called data or coffee so this is how it looks like now look at the dimension of it 400 cross 600 cross 3 that means it is RGB kind of channel okay next is if you want to you know focus more on to the you know the red part of it so what I do is I keep the third channel as it is normal and I say Red’s so for example if I say greens here it is going to give me a normal image when it is green how it looks like that’s it I’m just changing the filter why because I have not imported the image here okay if I want to make it more greener if I want to focus more on the green part of it so what do I do say I change this game so if I want to sharpen it basically from the green side so when I print it look at this if I want to make it more green so it sort of blue I say greens yeah it’s going to be no more sharper and sharper so this is how you can play around with some of these numbers say I want I want everything or else say I’m not sure I’ve never tried this let me try do from one to three see to never alter the middle one yeah yeah good it’s not play around with this yeah so this could be some some operations you could use it next is understanding the the filters that we have shown so this is one new image and this particular image if you observe you want horizontal bars so this is the same image as this the only thing is we have got a filter here if you want to sample it or if you want those edge detected horizontally the first and the last one horizontal one should be minus ones other one could be zero or one not a problem if you want to do the same operation vertically commonsense the first and the last column should be minus one so either on should be what’s okay what are we doing basically we are multiplying this with our image that’s it next is if you want to sharpen the what if you want to edge detected properly so if you should go to our PPT the same filter is used for that camel – one – one – instead of eight so let us say if I do 16 here for example yeah we have got back the original image almost what if I reduce it to two yeah so at eight perfectly it detects all the image and it does not show me black and white you can easily see there is an 3d kind of cubes over here yeah second thing is if you want to blur an image so if this is my original image and if you want to blur it like this what you have to do is you have to multiply also you can see you have to divide something by 16 and multiply with an identity matrix or I will say not even 16 you can take any number here say 1 by 5 definitely you majors compared to this as you keep going higher and higher air your bluntness is going to kind of vary yeah good so this is what it is majorly onto your image processing part of it play around with this images import your own

image and try to play around with this filters and all you will get to know now one quick question is sometimes we are using 4 cross 4 filters sometimes here we are using 3 cross 3 filter do not hesitate to use even 8 cross 8 nothing wrong in do it the only thing is you have to know if you want what kind of feature you want out of it that’s it yeah there could be many websites where these standard filters are defined so you can download some of these filters and try to create your own values out of this image play around with that if you are not able to find them on the web do let me know I’ll try to search for it good so do some work on image understand what it is now moving on to something complex now so let’s see how to do all this now how to go ahead and can do convolution out of this so this all the musics yeah this observe the animation that is happening on the right hand side so on the Left we have got three bigger images where we are convoluting them layer by layer so the first one was layer 1 now we are doing layer 2 convolution and finally you will observe the green colored cells these are the final outputs of your images over there ok good and also you observe the right hand side how RGB is working on road okay so any image is nothing but a combination of RG and B 3 layer and finally we are convoluting it and converting them into one particular value so here 4 cross 4 is getting converted into one particular value so we are converting the image okay good now moving on another simplistic version where you don’t need to focus on all of them so if you observe in our previous animation we were going sell by 6 so if you see we did not move anything extra we are going to almost all the cells did you people see that yeah here what are we doing we are not going to all the cells we’re going do some focus the areas for example yeah now how do we do that there is some factor called stride if you guys have seen your weekly videos the professor’s talks about something called stride factor what is try it very simple if I have this say for calls for image ok this assumed is the focus of for cross for image and if I have if I say I have a 2 cross 2 filter this will be my first filter if I say my stride is equal to 0 that means I have to immediately go to the next but if I say my stride is equal to 2 so what I will do is let us say this was initially it was intended like this now if I say my stride is 2 I will skip 1 and 2 I will go to the next one so in this case it’s not possible because it’s not there but what we’ll do it just imagine there was one more column I will skip 2 cells and move on to the next one so these are the examples where you can increase the stride and focus only on to certain matches of your image you don’t have to worry about the whole thing ok good so this was your convolution right anyway still this is basics only I’m just giving meaning comfortable with it now comes the real part how do decide number of filters and this size and that size so please remember you see two formulas here one is called output one is called output height and as I said will go in reverse direction okay so first thing is what output you need from your convolution layer first question so let us say from this whole image if I just focus on this tires and let us say this door it will be enough for me to identify that this is bus okay now that’s not a problem here the problem here is how to decide what filter size I should take about so the first thing is I want to say I want my I want to understand what will be output so what do you guys do is either you fix it on this or you fix it on this whichever way you like it you can we cannot have both options say I fix it on my filter say for example I use to cross two filters for number of times four filters out of that my stride is equal to 2 that means I will skip to 2 cells and more I don’t want to do a very good sampling yeah for that what I will do is I will take the width of my image which is 32 okay – width of my filter which I have decided s to so 32 minus 2 so let me do it properly for T 2 minus 2 1

minute so I will reduce this thickness otherwise I do not be able to write width is 32 what is this width width of my image ok you mean width of my filter that is 2 plus 2 into padding padding in a sense sometimes the image is not enough to scan because of our filter we just saw when just now when I was trying to do stride – it went out so for that reason what you do here to pad it back means add an extra layer of zeros around it in our case we don’t need padding so this will be 0 I will say 2 into 0 is 0 divided by stride what is my stride stride is equal to if you compute this 32 minus 2 is 30 30 by 2 is 15 15 plus 1 is 16 this is how I got my convolution in there now why what I did was I froze this and I got this what if you freeze this and you can get this how very simple you say sixteen over here and don’t put anything here so I will be 16 is equal to 32 minus filter width plus 2 into 0 divided by stride I set as to ok plus 1 take everything on the other side you will get your filter bit so it will be 16 into 2 is 32 32 is equal to 32 minus filter width plus 1 so I will set 33 okay now it’s okay yeah so 32 plus 2 is 34 so when 34 goes this side will become minus 34 that will be minus 2 is equal to minus of F of W minus minus cancels I will say my filter will this is nothing but which is nothing but your answer so whichever way you do it I am okay with it yeah you can check it out with this guess quickly check it up is it matching here I will not do it you people compute it both the ways just check it out is it matching are you able to do it padding is 0 in this case okay so don’t take padding factor you have matching matching right good so what if I would have taken 3 cross 3 that can somebody tell me the output layer width what 3 cross 3 filter filter here it sort of 2 cross 2 let us say I have taken 3 cross 3 as a filter now what will be my output with you need yeah okay 32 cross 32 is to be let let me not confuse you guys let us say this is mm let me put some some some brackets we say 1 say 2 3 4 & 5 okay similar to that one too four and five good now what should be my filter size such a way that or else I will give the filter size let us say 3 cross 3 is my filter size can you people tell me the output form watt output I will get what cross watt use that formula I’ll give you five minutes just check it out it’ll be good Reggie will be able to you know visualize the same thing in your convolution also because you have to put hyper parameters they write what should be the width what should be the size 5 minus 3 plus 1 dou p.m. Phi minus 3 okay plus 1 padding should be 1 0 right so 5 minus 3 divided by 2 so that is 2 by 2 that will be the answer I want em cross and you guys can type it out for me on the chat for you

guys yeah okay so width of the image – filter width plus 2 times padding divided by stride factor plus 1 this is the formula for output width and output height both the same formula now tell me guys what will you do you have to decide on padding you ever decide on stride how will you do it stride should be one stride should be to what it should be yep padding is zero else tries do ok I’ll put this to cross you okay so one answer I got is to cross – okay perfect so did you see 5 cross 5 image now we’ve got the best as out as to cross – okay next anybody else change the padding and stir straight check it out see guys what what what here drew said was he doesn’t want to do a nice change in the padding and also doesn’t want any stride so sorry stride is what truth to write in that is you can say it is this one three cross thing and it’s like this agreed and come like this so do people agree it’s a – cross – perfect one two three four because he has taken stride is – what is this stride was not – and in that case what will be the answer let’s extract this one 3 cross 3 3 cross 3 exactly so you will get more features 3 cross 3 what if you take a padi truth what if there was a padding there let us say it is not always necessary that too bad or do we have more stripes and all it all depends on your filter so you base line one of them either you baseline this or baseline this and then choose the other or out of it I will always say baseline this easy way ok is everybody in sync with me everybody got this guys from next week will not be able to discuss all this ok so when I say something I have taken it in the code you people you people should be able to identify what is stride what is padding our decision yeah anything else say any anybody wants me to repeat no all right spreading it one then what will be a ton actual image see if padding is run through padding is like putting zeros so you will get one extra white colored patch onto that that’s it I will say padding person is not good but sometimes our images are of some odd size no to match up the filter we have to do paddy that yeah it is like a fake white colored layer unwantedly pasted onto that so krishna padding would be would be used if the image is not like in square in shape or something correct correct agreed so here we have taken 5 cross 5 now what if it was 5 cross 6 so let’s let’s try that out and guys remember it’s very difficult to visualize all this so don’t strain yourself if you say ok I’m not a beginner definitely one short will not be able to hear it say this is 1 2 three you say four and five okay now if this was five cross say three for example one two and three okay now my intention here is to use a 3 cross 3 filter is it possible first of all just check it out guys a 3 cross 3 filter how it can look like so let me increase the thickness so it will be 3 cross 3 yeah and if I say stride is equal to 1 now what’s going to happen to have one more 3 cross 3 do I have space no so in that case what I have to do I have to add one extra padding layer on the whole image okay so usually you will see in all the codes is usually by default they take a padding of one or two it’s okay because

we are not sure sometimes we’d not be able to visualize this it’s okay to make an extra padding doesn’t matter sometimes we don’t do it if we don’t do it if it is an issue it will throw an error anyway it will notify so once it notifies us either we change the size of filter or we change the size of image immediately good Vishnu is okay now yes got it okay so moving on now so this is very important please remember so when I share this keep it to the new end is very handy also it’s not a big deal okay so this is what is padding what it will do is it will end wantedly add in extra number usually it is zeros so it will not impact as much if we do pulley because if you remember what is pulling dozens pulling will pull out the best feature so if I pull let us say sorry if I pull let us say out of this anyway the zero is going to be ignored somehow okay so it will not affect much but at least it’ll not hinder us from doing yeah next is pooling layer so we have two type of cooling available I will not say that’s it you can have your own version of cooling also but these are the most popular ones max pooling average pooling so average in the sense you take average of all four of them and you cool as one number Max’s take that max of it so in this case it will be 1 1 2 okay good yeah there is one more thing sorry this is the max pooled image from here it is quad so if you see 20 30 37 in one window this is how we got if you still want to max pull this further then answer is 1 1 2 okay I did not see that sorry this is coming from here okay this one also if you observe average pooling so take an average of the first one let’s check whether it’s in sync 20 here so 30 to 40 50 to 52 why for that it’s 30 all right so what else we can do anybody any idea we can use any other computation you been polling to I mean since we’re doing max features never tried it out let’s see that right yes you want to spoil an image if you want to spoil anything else in this case yeah but you have to design your own at least it will not be present in the standard function so it is an your own function the standard values are these two max pooling or average good so that’s it yeah so now how it looks like so they’re fully connected last so what do we have we have got one key person yeah so where do you use max pooling versus average pooling and then how do you make that decision okay see use max pooling on to let us say you have a low low sample image so let us take a sampling here is two megapixel for example in the sampling rate here is 16 megapixel in that case what’s going to happen is the images which are nearby now will be almost similar itself so in that case if we take an average not a crock but if you feel the sampling of your original image is not that great you do max pooling accept the best out of this feature okay see I’ll tell you why let let’s let’s go back are you are you saying that if the image is higher resolution then go for average average and if it is no resolution see if you look at this one okay this is only two to zero or one or I’ll say black and white so now if I have a filter which is focused on to this which one you think is of our importance okay max we need to know where there is a black pixel definitely they’re gonna get this so this for to cross three image now is

represented by one which is called black which is nothing but black enough now let us take one more example say for example you are having your stuff notice now what does maximum number over here all of them are once so if you do max pulling onto that what you will get one what if you do let us say average onto that what you will get averages three by py for this or you can do this so anyway the answer is we got to know that majority of the features of en are one yeah you know so this is some example of average this is an example of max given T so I think the other difference is if you do average pooling then especially where for example you have for poor black hmm I think that difference between the different values would be would be more of a range okay good let’s take let’s take an example say where do we get for here we don’t have it so let us say we take something like this yeah we take something like this 3 cross 3 yeah so if you take an average of this yes I agree with you yeah kind of here works out so we have got 1 2 3 4 5 5 divided by 9 that will be our average yeah correct ok good so it’s guys you understood what is it trying to say he’s trying to say that you will exactly get the number how many blacks are there how many whites other that’s it by by name so the denominator is nothing but your total number of spikes so that is my in essence you are getting more information into the next layer or in the into the next character correct so we are basically taking the best information out of this and moving right here in our case guys best information means black versus information means white that’s now we just need the best features we don’t need extra background stuff from this image let’s not do it good so this is one way to represent an image if you want a little bit more simpler version you can do this what I feel this image is a better representation of where you can take out certain features out of animation directly classify that it’s a car truck or over this is this could be L we will see this little later in our course we don’t need to go in scan everything certain features will take it out completely okay good so this is an example I will say of our CNN the regional convolution neural networks we will see the other line this comes in around four third or fourth three good so we have done almost everything which was needed so next session what I will do is that we will start with implementation of the CNN and then we’ll move on to a very simple topic called transfer learning so just to give you guys a very clear quick five minutes idea on to next week’s videos don’t worry those vgg Alex next FlexNet then imagine eight rest net all those nets that are produced now they are produced by some people which who have done lot of computation combinations to find out the best combinations of convolutions here you know what it does it basically whatever image you give no it tries to classify you properly now what has happened since these neural networks have won certain computations they are standardized now what does it mean there weights are stored in an HD F format now on what you can do is you can build up your code you can download their weights you can ready the architecture for them you can automatically download their weights and you can build your own FCN and at the end so whatever you download no that will be only convolution noodles then last layer the last three to four layers that we are supposed to do our own layers that is your finally fully connected neural network that is what you have to design that’s it your network is ready you don’t have to take a pain of building your own see this is called transferred learning that’s all

is the topic for next week see it like not train them you can train this so make and you do you’re fully connected neural net when you play in your model these are non trainable static stuff this will be your trainable stuff that’s it do you think our problem will be solved yes right you don’t have to take a pain of designing a very complicated neural network neural network is good i we get it but you don’t have to take a pain of designing all this you are getting this ready-made this part this one what you are supposed to design a new people are very good at this now all right this is called transferred learning it’s like one of you design a code generate code and give it to all of us we just change the input data and run it as our classification model that’s it so don’t get confused if the professor is showing LX net election is very big you just have to understand that this is a ready ready stuff we are importing it and running it on our model so that I don’t have to go through a pain of training my own model and usually the parameters are something like 6 million 10 million 16 million that’s the reason we don’t need to train them you directly get this ready-made foods Christianity can you explain with an example I mean ah so do you say that maybe you have a corpus of makers right EMU you would get kind of a model which is which already can identify vehicles because it has learned it has gone to a corpus and hmm then you just add maybe a few more layers on that just to look at maybe if you if your problem is to identify a food maker then you work on that he’s correct connect see we should have you heard about framework anytime in industry yeah what’s the framework framework is basically like a guideline of how to do okay so for example let us say you are doing a little skelter something like that okay oh let us say have you heard about a three you three no III implementations under lean it’s a framework let us say you have a service and you want to improvise a service so what do we have we have current scenario let us say what is just services currents in on you what is the problem you are facing there how do you intend to solve it and what is your timeline to solve it let us say there is one framework available to us what you need to do when I give this frame up to you all you have to do is you have to put in your data that’s it yeah correct something like this you can imagine that there is a neural network which has its own corpus which was trained long math which was approved by Lord of data scientist long back which kind of is similar to what you are doing let us say the corpus over here was lot of image or object identification any kind of there could be identified now we already have this mural into already so what these people have done is they are actually frozen there waits for us and converted in them into an HD F format now what do I do I take this weights as our framework and now I just put in my data onto that that’s it okay and use them that’s it so I’ll give you some of the popular ones which I am comfortable using his vgg net mobile Lynette so what is mobile net you use it every day on your phone if you people observe your phone tags your face it is not possible to put a very heavy stuff on your phone of computation so we have single tried you know simplified vgg into a very lightweight one called mobile in it and we are using this ready weight you can use image net you can use the rest net are yes net we will see down the line one of these sticks but in the next videos you are going to see this so should be very simple the concept of okay so I think I’m done okay before

moving on I just wanted to ask you one simple thing let us say I have an image which is say 64 cross 64 and over here I have a neural network say the neural network having 512 inputs for example okay and I will say again they have their own hidden lives and finally we are converging this neural network to say one classifier or two classifiers we don’t care so the pointer over here is we are supposed to input 512 inputs my current input is 64 cross sixty-four cross say it’s a grayscale image so one can you guys help me to put how many convolutions I need and what will be the size of my filters max pooling and all that can we do this quickly so that I get to know that you people are in sync and we can go further so let’s let’s start putting the blocks over here one by one yeah so let’s put the first block so what I will do is I will we will write the filter size over here filter size and max pooling and finally we will fill we will put here what is what is the total number of you know we have got over yeah so let’s start how to do this how to design this filter three cross filters okay so we will start with 3 cross 3 ok and how many I need for example how do you decide on that yes so from my side is this particular term we have no clue on to how many we need and also for now you keep it empty it’s ok but at least let us say you have decided 3 cross 3 so if I have 3 cross 3 here what will be my output here so what would be the output of my what you say filters over 64 plus 3 minus 1 sure yeah ok what is the formula width of the image – width of the filter plus twice of padding divided by stride plus 1 this is the formula ok so now start putting it so first of all I will say okay stride is one only I’m not changing my stride and padding I have not done zero padding so what will be the size here so 64 minus three plus one 64 minus three is sixty one sixty one sixty one by one so 62 so I can say the output that I’m going to get will be sixty two cross 62 now if you look at this and if you look at this it does not make much sense to me why because we have not reduced much so to reach two five one to the 500 meals when I multiply these two I should get five into somehow so if you multiply 64 and 64 you are going to get 4 0 9 6 yeah so this particular thing is not going to help us so in that case what do I say is this transformation is not that great now to reduce it what I will do is either I will increase the size of my filter it’s not fair or else I will increase my stride so let us say increase my stride to 4 okay now can somebody tell me point 64 minus 3 or is let’s let’s let’s be fair ok 64 minus 2 2 cross 2 will do no problem so 64 minus 2 that is 60 to 62 by 4 that will be 31 by 2 that is 15.5 so say 15 will take so 15 plus 1/16 cross 16 we can get so that is a good transformation so now what is the problem here it guys is this is not an easy task to do so on purpose I took these numbers so that it is not an easy task that okay I have these things in my mind I have some input let’s try to convert the input to

something and also it is not easy to decide how much is the size of the filter what will be my padding what is this so it is good to know the formulas and do convolution on two simple images but when you go on to larger images say 1 megapixel image for example and if I ask you guys is that okay my pic my image is one megapixel can you design it a CNN for me which which shouldn’t take five one two over here so it will be you can do it not a problem you will you will be able to put one by one all the filters or as even you’ll back propagate and try to change good thing nothing wrong but it’ll take a lot of time so to avoid these confusions what we do is we use a concept of machine learning a sorry concept of transferred learning and sorry now what we do in transferred learning is let’s try to see that can any of you name one library from machine learning which is the best like which which has almost all the algorithms or all the functions that we dropped like Alex I’ll give you an example of SVC or KNN decision tree or under forest which library be used for scikit-learn right let us say we use that library to do that now what is exactly cyclic learn can’t we program SVC using for any flips we can easily do that right but why don’t we do it why because we already have a function somebody is returned it somebody has coded it in a generic way such that it will help us to input our unique data and according to that we get a customized now it is following my purpose so what we do we call this now when we call this particular thing what are we doing we are getting the stuff ready-made from them we are getting the hardware ready from them the only thing is I am sorry we are getting the infrastructure already from them only thing what we are doing is we are putting our data now keep this in mind let us say there are certain readily available convolution networks I’ll show you down the line which all are there but say there is one convolution Network which is readily available and say you have stored the weights of these functions you in an H fh5 file or h PF fine whatever file Mieke you have pickled all of these weights I am NOT saying the entire network I’m just saying the weights of the red weights in the sense what is the size of the filter and what is all whatever we saw in the above example which we and not if we are not able to clarify weights are there now what do I do I ask or I request this particular model to import the weights to my model so let us say this is your model you have created an exact replica of this model now what you do is directly pick up the weights and drop it in your infrastructure okay once you drop it in the infrastructure what is remaining now so you’ve just got convolution weights what do you have to do now you just have to build your f CN n here that’s it fully connected in your network this is we can easily do it not a big deal so this is what we do on the industry side we do not write much of the CNN’s in even if we are successfully able to put a scene in our own training it and correcting it will take a lot of time to be very frank so I will say majority of your implementations on the industry point of view you will be using transferred now but in in in this certification at least for for not tore I think three weeks we will be handwriting some of the scenes and after that we’ll start picking up CNN’s of our choice as this okay and that example that we saw we will see it in the code let’s see how we can solve it up okay any questions on transferred learning the concept part of it I will show you how we do it but this is what overall goes around we directly take the weights and use it okay any questions no perfect no good so now let’s try to use a PPD – so today I will use one PPT where they have mentioned a lot of networks now for first point it will be very confusing immediately so don’t expect before I don’t expect you guys to master this networks in one model itself what we are going to do is I will pick up each network every week so let us say in today’s session I will pick up VG G which is the most important one probably next session I could show you something on LX something on le net or I can give you something to do on le and pose that we can go to the complex networks like nets will imaginate and all those things okay and this is not it there are so many available okay so let’s part yeah so for today’s agenda we will start with CNN architectures so we just took an

introduction on what is CNN CPU vs. GPU we know this not going much detail and the most important transferred learning now coming to the architecture of CNN try to push our formula here whatever we saw right so Harish I think this will solve our problem except I will say it will not dissolve the problem of how many what is the math behind selecting the number of filters I’m still working on this guys and give me a week’s time I’ll try to find out on paper or some logic behind this – okay yeah you have okay even up in the search for that yes and also I will request our I think the professor who was teaching this if he can share some papers with us so Frank so far I am NOT able to find many good references where I can say that this is the math behind selecting these filters maybe what I feel is as you said there they have also decided first what is the final final can fully connected layer and then now they decided on this number civil engineer that right maybe maybe I also think that but it could be some logic because randomly I cannot put anything here yeah I feel also I see if if you see here what does it mean is we have 28 28 6 filters so what if I multiply 28 into 28 so let us say I have 32 into 32 now when you multiply 28 into 28 and then add it five the six number of times I’m not sure we are going to get the same number back to be very frank so what is happening is as per our analysis or as per our knowledge in CNN what is happening is we are converting say if the stride is one then definitely you are going to have the network which look like this so one particular image is getting converted into many features that’s what the six thing shows so I used to take it as feature Maps I should say these are six feature maps available that means I can pick any one of them put it to one of my convolution Network sorry fully connected networks and try to find some information out of this if possible okay this is what I used to take but now I feel that is not enough I need also to know how to do this so apart from this in today’s session let’s ignore this robot from that you can directly see 35 minus 5 that is 27 we have seen stride as one so divided by one padding is done zero plus one so if you look at it 28 is the one if we still do it 28 minus a 2 divided by 2 plus 1 you have 140 not yes domina volts pretty good on to this so at least 50% of the stuff is solved except now if you ask me ok how they came up with this network now so I will say it is a lot of mix and match trials happening so it’s not a one short job that ok they decided this and they’ve got the best athletes because if you look at it they are currently working on 60,000 parallels ok now what are these networks and how we can use it so if you look at the highlighted area that I am just drawing this is what I am interested in so if I have a ready-made Network open source network for all of us which can identify handwritten digits so your neural network project if you remember 7 and say X 5 and all I know they were not handwritten but anyway we can correlate this if I can do this what I can do is rather than writing KNN rather than writing the neural network what I will do I will ask this alienate weights to be dropped onto my network and then design my own fully connected neural network depending on how many number of outputs I have and take it forward ok so these are the features now if you if you expect to use the readings onto a face recognition system then it will not work to be very sure why because please remember this is what is important so I am a specifying his tomorrow when you’re picking up this you should know what it will solve okay this is it this only works for the handwritten digits nothing else we can apply yeah if you are importing the weights okay if you are replicating this Network and giving face as an input and trying to get an output and just check how it is if you get a good output then you can say that yes it works for sectionals correct but for importing weights it works only 400 because they are trained only on that data it will say if you are using it ready-made then we can apply it only for the handwritten digits but if

you are trying to replicate the same thing same architecture then we can provide the face as whatever inputs we may we can perfect yeah so when I deliver today’s code to you guys nose in today’s code we are going to use vgg we are going to import vgg and do it so what you guys can do is you can you have the network in front of you you pick up your older data that emanates svh and data from your previous project pick it up try to replicate this Network and see what you are getting is a simple okay so how do you replicate you use the same values and at the end when you convert connecting this fully connected neural network you multiply these three that’s that will be your input as simple as that or else if you don’t want to multiply all three of them five into five twenty five and you give 25 as your batch cells okay so at one shot twenty five will be picked up put it in and move or else if you want all of them all the filters to be put in at one shot you can do that way or else you can pick up one by one yeah so try yourself good it is good if you play around with this particular network because it’s a very basic one so if you are good at this then you will be good at the next one also so if you look at the features it is forehand return it was published in 1998 so I would say it is outdated algorithm but I have used this into multiple tasks where I have to identify simple digits it works perfectly good no problem at all yeah 60,000 parameters if you are looking at your current collab then I will say it is not a very big network if you are looking at Jupiter yes I will say it is heavy on Jupiter the dimensions of the image decreases as the number of channels if you observe the dimension is decreasing but number of chances increase or else I will say rather than chance what happens is when we say channel we get confused with grayscale energy another case I will say call it as feature Maps so each square that you see v cross Phi square carries some kind of feature that’s what it means the activation function used in the paper was sigmoid and damaged but now what we do is you don’t have to use always the coin and Talas you can use regular can replace both of them method no problem it will work good all right so this was your how do you say this network okay moving on okay so some pointers here is it’s a 10-way neural network classifier it works properly on 0 to 9 data handwritten digits tolerant to various transformations like rotations and scales so if you do augmentation also it’s going to work good so our SV action data will work pretty good out with this no problem was used by banks so recognize handwritten numbers and digitize checks so still on a very basic level yes they do it so what they do is if there is a check let us say that all if there are certain numbers return on the check but what is the payment and let us say there is a signature over there or something in those cases if you want to identify the desired sorry for identify the check number or the date you can easily put the check below the scanner and pick up the data and process it very simple you don’t need a very heavy hardware for this also and it’s a 4-way it’s layer so if you observe here 1 2 3 4 4 weights okay now if you guys have understood this one quick questions for all of you can somebody tell me if I put an atom as my optimizers let us say ad Adam is going to work on this or on this it’s some time think about how my backprop is going to work is it going to work on only fully connected or it’s going to work on everything I think for everything perfect and what if drew I do okay so in in the first case if you are using the whole network as it is yes backprop is going to work for both of them I will show you how backprop works for this you people know how bad pop works okay now leading to this another question what if I import the weights to my network and I use my own fully connected neural net and go further so in that case back prop is going to work on both of them no no you got my question let me repeat myself what if you have you have you are imported so this is my import of weights from the Le

net website to my network and after that I am building my own fully connected neural network here something like this and now if I put Adam Adam is going to work on this or Adam is also going to work on this I think it will work for only for that part which we have customized not for the imported part I feel so not sure no no perfect answer why because if I back propagate and if I change my weights there is no fun of importing that’s why we imported the weights because they were standard weights this way already learned weights so if I back propagate here it’s not going to work out so we have to make sure that if we do it we freeze these paints whatever we have before this we have to freeze them they are not editable and this side will be editable so I think in third or fourth let me show you how yeah this is a very generic question so when you go for an interview or when you look at some of the most important deep learning convolution or CV questions this is what they ask so this is one concept keeping this in mind let’s try to complex so let’s try to make one more model out of this which will be your Alex so Alex net is an extension of alienate I guess you can say and why do we need this let us say the earlier one was able to identify zero to nine what if I want to even if I want to identify some of the objects or some of the faces or any other thing which is not even number which is more than a number if I want to do that I need bigger architecture so what they have done is they have simply increased the layers of convolution and pooling convolution and that’s it okay there is no other difference and if you observe even there F CNN will differ because of number of changes on the goal for the model was the image net challenge which classifies images into 1000 classes for example has 60 million parameters compared to 60 thousand parameters so when this increased okay so when this increased from 60 K to this one even the capacity to classify images increased for us now it could be anything if it comes under those 100 or 1000 classes you can use the weights ready-made from this network to your network it uses Rayleigh activation function default and this paper was conveying the computer vision researchers their deep learning is blah blah blah and this is how they have done this so if you want to see the tabular structure they have taken 1 2 3 4 4 4 heavy layers but if you look here on something is wrong in my diagram if you look here there are more convolutions give me one minute we have one convolution we have yeah we have a convolution here condition here this is no convolution no convolution there is a convolution here and there is a convolution here so I have one two three four five six convolutions and two similar layers okay there is no change over here alright so this is what is your it’s an extension please remember this particular number it is very heavy let’s go to the extension of I okay what made LX net successful so I will say architecture I don’t believe that because anything you mix and match definitely your mom it’s something out of it we can say overlapping max cooling so if you observe here what is overlapping max pooling yes if you observe we are not doing in models we are doing for one they’re doing for another one we’re doing for okay this is what it means you don’t have to do always max pooling after convolution redo function was universally used it is very simple on dropouts cropping data augmentation and inference augmentations inference augmentation I would say is okay let us today take the case study and he visit this let more or the less if you look at the important part this is what it solves this is okay it’s not a big deal not a big deal yeah this is what you need to remember about max pooling you don’t have to do all these and i think architecture fun so these are some important success factors of alex next coming on to vgg now increase still more

number of pooling and convolution list you will end up with this particular network and let us confirm what all the image size here to 27 to 27 3 and this was 9 to 6 1 and here we have – 24 – 24 3 and for listening to these fools in the nine six okay so if you observe again what you do is you increase the number of convolution layers modify Alex in it to make something called vgg sixteen is not the latest version now now we are having ninety but both of them were equal so for me if I have done some implementation if I am using we did sixteen it gives me good accuracy I don’t go tonight it’s okay anything is really heavier than sixty yeah other all remains the same thing as over Alex no changes so what are these alga what are these networks is there were some challenges in which some of some of these people came together and bridge the song so if you ask me how it’s about mix and match will be very fun yeah some of the important things is from sixty million now we are jumping to 138 million parameters yes it holds a total memory of 96 MB per image for only forward propagation most of the memory are in earlier layer so if you just observe that you have two things one is your network and one is your f CN n so we can say that most of the memory kind of goes away here this is the heavy part this is a lighter part number of filters increased from 64 to 128 256 256 2 Phi 1 2 & 5 & 2 has been made twice so they have just replicating some numbers from the network so you don’t have to remember all this whenever you want to use it just come back to this particular thing and try to replicate the network you have the numbers ready you just have to put it in the chaos add dense chaos that’s whatever that particular command you have to use and leave anybody is interested in paper which is the paper attached I will say it little heavy suffers get comfortable with all this and then go on the paper and as mentioned over here there are two version 16 and 19 but more or the less it does the same job so whichever you want to use good now comes a little problem and apart so now these were the three simple networks which we can visualize now comes a little different network we call it arias net resonant now my question to all of you is what if I want to use this particular hardware whatever we have done okay it’s a very good network agreed but now I want to use on a handheld device and within fraction of a second if I give an image now within fraction of second it should give me some output or it should put a square around my face and say that yes we identified this you will your device be able to hold the sum okay in computer yeah so if you have a mobile for example so will your mobile be able to hold this up just this this heavy proposition may be as I said there is another architects are called more poignant mobile yeah oh that’s for you okay yeah mobile net is an alternate version of it yes but let’s not go there let’s try to think what if what we can do here or okay let me my example is not corrected let me try to create another example so guys let’s take one more example let’s not think about computation side let’s say I am currently having one image which needs to be given as an input and immediately within some time limit T I’m supposed to get the classification output along with some tagging or some some classified now as you saw the number of layers are more of the year so we had one layer another layer third layer for so many layers are there and it is going in a sequential manner so far we have not broken any of the sequential what is say flow as of such correct now what if I want to increase the speed of this network what should I do as per you let us say I will take five inputs from you guys the current time is T okay I want to make a time period saying that new now teen you should be far less than that means it should be faster what should I do are you talking about a maintain

software like Koren media did some no no no let’s not go there roof let’s say if I make some changes in my current network I should get some good speed i watered that this network is very good perfect we got that I don’t want to lose it what if I am not happy with the time part maybe to increase the hidden layers I mean it might reduce the time in layers is right yeah but but simplify the input exactly so how do we do that you are perfectly right how to simplify this now distribute okay okay so if you’re talking about parallel processing something like that yes okay fine that could be a possibility so let me write down what if I have a parallel process or I am having a Hadoop environment where I am doing parallel processing okay fine this could be one possibility but we are not changing the architecture right what we are doing is we are distributing the data I would split my data into many and then I will distribute parallely on to multiple networks okay what can I do in my network okay let me give an example I think then you guys will be able to pull up something so guys let us say when this is my computer screen that you are able to see okay and for example in front of my screen I show you half of the image like this okay now as soon as you see so just imagine that you are not able to see this particular part of it so as soon as you see this thing in my screen what do you assume it to be what shape to be everybody will say hi the circle okay or else if I just in the timeframe if I am just scrolling one image onto my screen and you are able to see only half part of it but yet you are able to describe it right what it is you will have some assumption in your mind that yes he is trying to show a kind of square and as soon as it comes in the picture in my frame everybody knows yeah it’s a square keeping his analog in your in your concept what if I can give some early input say that I can take up the output of this and throw it through this so definitely it’s going to go here so let us say the output from here to it goes here it takes some time frame P meanwhile parallely I throw the output here also so what’s going to happen this particular network is going to make some outputs and start throwing it in future so till our image is getting processed here our output would have reached here immediately so there are certain cases where if you look at certain parts of your image let us say in in in this particular part this I’m sorry one of the feature map looks like this and as soon as I give this HL map let us say the feature map was here as soon as I give it to the next to next network and if it gives next to next to this network we could be able to identify that this looks like a cat possibly it’s a cat so till I get all of my feature my ABS all of the shrinking and all I would have easily identified this unable to get it Kaushik this is something similar to what you said but here we are not splitting the data parallel I am forwarding the data I’m avoiding one of them and forwarding it further able to get it everyone a myth Dhruv Shakti Aakash Irish coffee frontier ago so logical yeah yeah okay good so this is what rest net does now the only challenge so nobody challenged me here saying that okay fine if I do that let us say as I said there is one convolution layer max pooling linear convolution max pooling convolution there are so many like this now what I do is by default there is a sequential order yes it will follow but it till it goes to a fully connected miracle but if I take an output from one node and give it to say this node don’t you think the filter is going to mismatch agreed so let us say this is 28 cross 28 cross 3 for example the output and when we reach here it could be 16 cross 16 cross 6 forest don’t you think this is going to mismatch yes or no because in convolution every step we are reducing the size so the output of this particular one is going to mismatch with

the next one extra next one so what we do to solve this will we keep the same stuff for all of them that means we design a network which has same filter sizes same activation functions all right this is how it looks like it looks pathetic so if you look at it it makes no sense as such but yeah try to understand that we are giving an early output we are skipping one convolution and we are going further all right so it’s a very very deep network as suggested and because of vanishing and exploding gradient problems now what do you mean by vanishing and exploding gradient problem so I will I will show you down the line so let us see I ask you there is you have done the model you have all of you have cleared the model on statistics yes the first module yes can somebody name that 3 distribution we have done in this statistical model and somebody named this normal distribution okay Poisson distribution okay and binomial distribution binomial distribution is okay and one more okay let me I end estimated you guys yeah now now good okay good okay what if I say what were the statistical tests you people did what old statistical tests we know here to pay her T test retest and no no effect said test sorry goodness of fit okay fine okay then you guys are ruining my example okay see what I was trying to show you guys was okay good you people neither bad you said we don’t know it helped me to explain my stuff see what happens is days when you have a particular data which is passing through a multiple neural networks multiple neural networks you know now let us say the data is entered here it went through lot of these convolution non convolution layers whatever pooling layers whatever is finally it came here now say you found a loss here Doel by dough off wait one so let us say this is weights so what you do immediately you go here and change the weights it’s still in the next propagation if you are not happy we immediately go back and change the VEX at least I can say that at least till here if I change my ways I’ll be able to at least control so what do i what do I need to say is the weights which are there in the earlier part of the network if you have a very deep network like this this particular weights are going to vanish away more other less they will make not much significance to the network to be very front this is called the problem of vanishing gradient we are going to revisit this concept in NLP next module under NLP we have a network called RN n recurrent neural network so there is a problem same problem RL n is a deep network where we will face the same issue to solve this we use a network on LSD a long short-term memory network so we’ll have some memory where we are going to store these weights down the line okay now if you look at our rest net model also this is a very deep network and whenever you find the word deep network always say that or always remember that the vanishing X code ingredients are going to be a big issue vanishing gradient for these people exploding gradient for these people then gradients are going to go pretty high because lot of changes will happen which is very near to the output exception so this is what is your the rest NIT next less yeah so this is what is a important points so the architecture of resinate 34 now you have many versions of this this is Resnick 34 there is a rest net 150 I have seen less than 50 or so the concept is changed only thing is the number of layers that you are going to see here will be different the deep place okay all that all are three crossly convolutions and same convolutions no change so we keep a uniformity keep it simple and design of the network because we don’t want to play around because I want my output to work on any of these convolution so I will keep it as simple as possible okay and if you observe there are no fully connected layers and I think that’s it that’s all and one thing to remember is

the dotted lines in this case when the damage okay okay so there are there are there are two things I’m not sure if you are able to see the dotted lines if your input and output are kind of tracing each other that means you can switch out some of these lines these are optional connections that we have given it is not necessary to take each one of them and throw it so if you are designing it that you are surpassing only one neural network you can have some of them as dropouts saying that we don’t need them so these are the dotted ones so we will see I will implement one of these rest nets it’s not in our course it’s not in our standard format here I don’t want to leave this I would implement this resonate and I will show you once you are comfortable with the first three Elgar networks okay so we’ll do this let’s see how we can design take this so first thing is we will do design it second thing is will directly import the weights and use them perfect so this is what is the expanded or zoomed version of rested 34 yeah 34 represents the total number of deep how deep the network is now coming to how to decide how many layers to be a can on this part of it so very simple if you have one five lesson unit oh it’s a very easy Network okay it’s it’s it’s not a deep network at all if you have around ten days including initialization in batch normalization I can say it’s still an easy one but it a little complex a little deep as compared to five if you go to 30 and 100 these are the real deep networks especially and say that this one is a very deep network and in this network if you are using this you have widened if I have to skip the connections because of time is going to be very long so if you are planning to have 100 layers or even 30 layers please make sure that you have skipping the connections the one who should it interested so more or less I can say these are the examples of rest nets LST ends and all this one and anything above thousand layers I’m not sure I have never seen in my rights so what we have no idea about it how deep is going to be so if you want to look at one of the layers which is very deep this could be an example of the inception network from Google if you look at this and they have lot of connections lot of skipping is happening here so what is it the inception network consists of concatenated blocks of Inception modules yes the name inception was taken from the main limit eaters sometime a maximal block is used even before the inception module to reduce the dimension of the so please remember it is not always necessary to take the image as it is the wave used to do be same thing we can do it here also now have I shown you any times how to reduce the dimension in image processing no no have I shown you anything on auto-encoders anytime no right right I’ll show you today so if you remember we used to do SVD PC NSA and t-sne I’m not sure you when you have done that have you heard about TS any guys all right I will show you one more dimension reduction technique in machine learning today but apart from that if you want to do same thing in neural network how should we do it the first way was nothing but your convolution text and illusion was something like shrinking the data itself but if if you look at convolution as data reduction there is a problem let’s see how let us say this image you are having here which is say 64 cross 64 and after convolution before flattening you got an image say which is 5 cross 5 or you will I will not say image I got some data which is 5 cross 5 what we have done we have we are we are able to shrink this up ok now can I use this for future processing if I give it to one of our networks say alienate LX net and all will will LX alienate be able to identify that this is a part of a larger image like this yes or no can I do something like this which I call it dimensional reduction is it possible to think about yes we can do that we can do that ok we got one yes any anybody else thinks it’s not a good idea is a good idea it’s possible ok fine so from my side you guys can try this out if you if you can’t do it if you put

a Sen and at the end yes it is possible it because it will become your CN and again if you put an FCN in here but what if I want to send this to some other networks other networks will not have the decoding Ziya right you remember we have done some some encoding here you have used some filters to encode this data have we given them the decoder back it is like that cipher I am not sure if you heard about this let us say you want to secretly communicate something so what you do is you implement your cipher there will be a filter which you multiply with all the things and it will create some new numbers it will look like junk data now this cipher to decode this you have to give divide the whole data by the cipher again then you are going to get it back correct now this is what I call it as a cipher whatever we have done here are we providing cipher along with that so tomorrow if you want to recreate this image use the cipher we are not doing it right so what if I want to reduce the size of the image and if I want to recreate it we should be even able to replicate it back ok but in this case what I will do is if this was 64 + 64 I cannot replicate the whole thing back I cannot have the whole sampling done so what I will do it yes if I want to reduce the size to 50% I will say 32 cross 32 but still I will be let us say if the number 7 is here I will be able to vaguely represent 7 back this is called auto-encoders I will show you today 1 implementation also but a very high level this is how it looks like and if you ask me what is present inside these networks there are two things which is present one is called encoding encode and one is decode now anybody here from electronics and communication background or communication background or at least specially DSP part of it anybody check to your camera is on yeah anybody from this background or even from computer science background you guys would have done it encoders decoders no cryptology or something i have you can call it our study correct you correlate this concepts they’re saying that we will have something done on an image we’ll shrink it and then later on we’ll facet we will again recreate it and pass it to our network now why do we do that very simple thing I have reduced my dimension every 10 the data all right now what are these encode and decode very simple it could be your fully connected neural network designer network whose outputs are let a lesser as compared to your original input how many other layers you wanted how will you design these layers depending on this so first of all you have to decide the decoding size how much you want to decode up to okay and then finally you design inverse neural networks keep training them unless and until you we are satisfied that both of them are almost similar this is called Auto encoding it’s not a very popular way to do it because definitely in future if I have seven this image will not be one megapixel okay I won’t take HD images and shrink it and if I take an HTML shrink like this definitely I’m going to lose some date out of it everybody agrees to that you cannot completely recreate this no matter how good our network is here okay so this is your neural network simple one this is also a neural net you train the weights back and forth till they both single alright so this is an auto encoder I’ll show you one implementation down the line yeah so what I was asking here yeah so yeah so in other cases what Google has done is they have done max pooling so also you can think about max pooling as a dimensionality reduction yeah so let us say if you have an image say my favorite number 32 plus 32 okay now if I want to create the if I want to replicate this image into 16 cross 16 what I’m going to do I am going to match pool by 2 32 by 2 is 16 very soon and do this or you can do the one which I showed you anything is okay so this is your Google net huh we will take this particular Network at the end of the module so that you are very comfortable and you don’t get confused because by looking at it itself it’s pathetic it’s horrible and to understand this I’m not sure even if this open source I’m still seeing to it if we can download it else what I will do is we’ll do a very good in-depth theoretical review want to this what is written how they got hold on to this okay and what is the application of Google net all right now if you compare

the most popular ready-made networks available you will find them over here if you look at this the most stable one is VG so down the line I think almost all of my implementations you will find me Gigi it’s very simple to work with me Gigi okay and look at the accuracy also vgg is not very great it covers around 65 to 75 but yes from the parameters point of view it is showing one of the highest parameters available more the parameters better we’ll be learning okay Alex net and Google read lies somewhere here rest net is somewhere here rest net 152 okay so 152 in the sense you are having more parameters that’s more number of D places compared to 18 34 and 150 less than 50 rest net 1 0 1 so different versions are there okay so this is how you can visualize this particular graph so I will I would say if you guys have decided to transition your career run to computer version then you need to know some of the most popular ones like wheelies I will say LX net BAE’s one and saris net 150 if you know these three algorithms you can say yes I know computer vision yeah because they are the top performers if you see except Alex net Alexa is the one very basic one but these two are kind of top performers here also try to correlate them with the number of features good alright now coming to concept of transferred learning so too much of theory will come down to this very simple diagram I have got a source task let us say some tasks genrika I am putting and I am running lot of hardware under this and I am putting it to my knowledge now what do i do is i have got my target task here let us say have a real-time application but I want to use it all I get my data I import your learnings onto my learning system and I classify my stuff that’s so you don’t have no more we have to do the classical traditional machine learning in your model and then use your model use ready-made learnings to transfer learning and if you look at the learning curve also to be very frank this is a bit obviously in majority of my implementations that I do I already made if I take a ready-made networks if I don’t take a pain of doing CNN’s ah now you may have to find then why are we learning it you will understand CNN’s the use of it after week number four where we enter our CNN fast our CNN you’ll OSS the all the complex algorithms that we enter know at the time again we we have to use CN NS we have to use some part of it so the concept of building CNN is too much of theory what do we have today is we have got a very simple data set where we have got data splitted already into test and train ok and what do we have inside both of them we have same five folders and each folder is having some flowers which belong to each one of them ok so this is what is the data our job here is to classify the flowers from the testing part of it so we train it using our training data and we use this particular chunk to classify them as simple as parrot so the upper ones there’s the upper code is for the people who want to use collab so I will start from here so this is my kiosk so now I think you guys are comfortable on the sequential part convolution doodie as soon as you see a condition to d-max bullying comes into it flattening it thence and drop-off now I’m not sure which version 2.0 whether this is going to work good writing from from here it is it looks like something like this from tensor flow dot your ass and then everything goes okay have to make some modifications you might get some errors if you have latest version I’m going on directly through the classifier so if you look at this I have got 32 3 cross 3 what do you say filters and I have got an input shape of 64 + 64 cross 3 so very simple padding is 0 stride is 1 just put the formula 64 minus 3 that is 6161 by starting stride is 1 as usual plus 1 that means answer is going to be 62 no

no no no no one is going to be yes 62 1 you know then by my it’s not working because the next one next one is 3 cross 3 32 ok so rather than taking this approach what I will say is you will take this particular approach say 64 into 64 is what 4 0 9 6 ok 4 0 9 6 when you pass it through your max pooling function of 2 cross 2 what are we doing basically we are dividing it by 2 so when you divide 4 0 9 6 by 2 you are going to get 1 0 9 6 1 0 9 6 again you max bullet you are going to get I’m sorry not 1 0 9 6 or you are going to get to 0 how much somebody help me perfect you know 4-h when you are getting divided max pool by to divide this by two you are going to get one zero two four right and if you look at our input for fully connected Network our neural network ends at 5 1 2 as flattered okay so take this approach of solving this problem yeah so as for me this I’m not changing the filter sighs it’s working good now this is not an optimal solution will I’ll create some problem statement and give it to you you guys play around with the layers more and more but make sure that at the end it should match with your input of your okay now if you look at it what we have on the first layer of we have 32 3 cross 3 this is my input standard stuff I’m using no changes here second one also standard third one standard same thing this is five one two and once you enter a fully connected neural network I’m saying give me some drop out a dense layer could be divided by four of this for example and activation function is reloj and finally if you observe I have got one two three four and five five classes so obviously my outputs is Phi and my layer is so now I think we are good this is a simple CN n that I have designed I’m moving on I wanted to show you something little bit more here wherein we are calling our atom okay and we are putting into our compiler but apart from our learning rate if you remember we have not used these functions so far now what are these how do you check this so one way to go and check all of the activation thing is you can go to Kaos documentation and goto optimizers you will get list of all optimizers available which is SGD rmsprop ad a gray grad AMS grade radiated atom atom is the most common one and SG is also most for women and if you have a deep network you can use an RMS propagation rmsprop yeah so I’ll show you down the line how you can use this also now what are these extra top so these all hyper parameters are defined there anyways so the first thing is your DK factors now everybody gets what is a DK what DK I’m talking about if you remember the gradient descent that we have you know it’s kind of decals down you know so the standard values if you don’t even initialize them the standard values are going to beat this one so I’m not playing around with this anymore I am just trying to show you that tomorrow if you increase your learning rate your DK is going to be more steeper so if you look at it beta 2 and DK rate if you add them up you are going to get standard answer is 1 so please keep a good check on both of them if you are playing around both more with learning rate and also if you observe there is something called beta 1 and beta 2 both of them represent DK rate so this could be the DK rate for the first estimate first estimate let us say it could be the first dough function that you do from your back prop and this could be the estimate the one from the second estimate the second one yeah usually we don’t play around with this even on web you will not find many implementations who have done this but just wanted to give you some hyper parameters here and the last one is epsilon it makes sure that we don’t miss classify anything all we don’t basically go into infinity stick so in none of our

implementations we are doing any operations with zeros if you observe that we are not adding or multiplying or dividing any some of the neural networks or machine and part of it Kiera’s we do some manipulation with zeros in this case it will make sure that we don’t do it so by default we keep it off we’re not worried about this so these are some of the hyper parameters and as usual we are using categorical cross run trophy because I have got five outputs and I matrix is nothing but right next thing we have not done any data manipulation that means we are not cleaned our data so now what I am doing is I am augmenting my data now if you don’t want to write a separate function for augmentation and if you want to do it as per stuff what you do is you import image data generator this is the function so it will give you a lot of parameters some of the important ones are these wherein rescaling the image you are having rotating the image or zooming the image flop flipping horizontal flip vertical flip all those things so this is the way you can create some kind of augmented images if you are not happy with size of data same thing you can do it with your test accounts okay next thing you call your test data from the directory and you say clip it to this size select one size by 64 64 because you have designed a CNN which input 6463 3 is a number of channels okay so match it up next total number of images you want to be you want to pick up in one run so depends on your computer if you have a high end RAM for it I will say increase the size of this now can somebody tell me why do I need to focus on increasing my back size what difference is going to make even if I specify this or if I don’t specify this what is your understanding of back size what do you mean by bad size how much data processor I poke correct so why should we take 32 here why can’t I take a more number correct so please remember if you want to increase speed up total number of so if you say 20 he pops and in each epoch say you are picking up around you are you are your computer is keeping ready 32 images to be processed what if I give more flexibility to a computer and say I will double attack just observe by changing the size of batch just observe your epochs the total time taken by 20 box 2 runs with 32 and the total time taken by 64 and let me know what did you also it might just load on kind of yes okay so again it is very important basically to select this so what I usually do is I divide this number by 2 i know logically it’s not correct i divide this by 2 and keep some family number ok so there is no logic on to that but yes high-end computers can do it my computer will crash if I take 64 or 128 here now moving on same thing for your test I’m taking the same part of it and finally you are fitting your data using the generator 1 so that you can get good augmented images and say this is my training set this is my validation validation data and in your epochs say I want to run 20 box you are seeing something new here today which is called steps pergi box now what do you mean by that in one a POC I want that particular epoch to run these many number of Tanks so what is 3 8 2 3 3 8 2 3 is a total number of training images we have divided by how many of them are picked up at a time 32 from where from here so I am saying is even if it is not an integer converted to an integer and each epoch inside each epoch you give me that many number of states so either you can have more number of epochs all you can have internal steps within epochs whatever suits you I give you both options so far you would have seen only this right ok so this is what you can do it I say this is a better option so that we get to know that if in case one epoch we are running 119 times again and again probably I’ll say within 4 to 5 epochs we’ll be able to get better accuracy over here the amount of accuracy the amount of accuracy change is pretty high so this thing is turning pretty good rather than having 40 50 hundred epochs you have to wait longer time for that even this will be longer because we are inside each epoch and having is May steps so definitely it’s gonna take longer but the accuracy part of it will be pretty good so if you scroll down the line I ended up at 20 a box after certain epochs you observe it is not going further the accuracy is going kind of going down there could be multiple

reasons for that the first reason could be the wave you’ve chosen the steps but this is the only way to do it second could be we have done too much of augmentation might be so that we are not able to find some pattern third one could be our learning rate and optimizer and the most important reason could be the CNN lathe that we have made okay so I chose totally to make four layers why because of our alienate had that option so I did that but what if I don’t want this I want to increase my accuracy here so I give it to you guys by next Friday or Saturday maximum send me your inputs how you can improve this alright you don’t need to do this on the industry side because we have got ready-made networks but anyway if you play around with these numbers and also check the block that Drew was given this will help you guys to formulate it delete one layer add one layer or I have taken a max full size of to cross through you can take 4 cross 4 also no problem 3 cross 3 also no problem and I have taken standard filters here you play around with the filters and see how it happens but the only issue is don’t run 20 box you are going to waste your time looking at your monitor so try to run less number of epochs and see how much accuracy is increasing down the line ok so here we have got around 80 losses 49 and validation losses 90 and validation accuracy 64 so I will still say it’s not a very good model because it’s an under fit model ok good so this is one implementation one CNN now what if we yeah so now a little different way of coding this so giving you guys hint of how you can save your model so first of all what you guys do is every time you kill your colonel you come back you have to rerun the model if you don’t want to do it use this two options wherein you have an option to save your weights whatever weights you have so that you can input it in the next network whenever you need it or you can save your home network hazardous so this is one way to do it another way is to do called what is called job Lib so there is a library called job which does it or there is something called pickling if you want to see all this three today I will show down the line if you are happy with this usually in CVS computer visions we do this job live in victory for machine learning again same is true are gone okay by in machine learning we don’t have weights to the store but here we have an option either you store the classifier itself or the weights any emphasis so now what I have done is have pickle them so I don’t have to read an every time I open my coat down the line what I’m doing I’m calling back my model calling what my rates both of them together alright now once I do that what am i doing is I’m reading some random flower or set of flowers okay and next I’m changing the dimensions basically every shaping them values offered by normalizing it finally I’m saying this the name of the flower that I’m going to pick it up sorry the dimension of the image that I’m going to pick it up and finally I run my classes I predict that particular image onto this yeah so if you look at output is the size of my image this is the image after expansion yeah next this is the set of options that we have and out of this options if you look at the softmax output we have got ten somebody hint out if this is my softmax output which one of this is the classified answer sorry Daisy exactly the one with the highest probability will be the answer so in this case we identified anyway it’s this lady here but yes please look at this so there’s not needed by anyway we got to answer directly so now whatever imagery care was based now what if I want to run it across the whole testing data and generate the matrix so very simple I’m just getting all of my test data from the path putting the hyper parameters and running the model onto that when you do that I get a confusion matrix like this and if you look at this these are our flowers that we are talking about and if you don’t want to do that there is precision recall f1 and support first now what is precision we know recall we know what is F 1 F 1 is a balance between both of them okay I kind of kind of say average between both of this and what is support support will say how many images of this class are present in the whole dataset the frequency of it so it’s something like

our target you know you used to do not value cons if you remember to check the cut target same thing like that in this case good representation but if you observe the precision recall they are not that great so the one which classifieds gets classified the best is this one sunflower all right so this is a very simple how do we that is yes t7 of daisies have been identified properly okay now out of that 12 of them which are actually daisies are identified as the next one dendelion 13 of them are identified as Rose 3 of them as sunflower and 5 of them stood up okay so find out if you want insights out of this you can find out the flowers which are very close to each other so in this case you can say that Daisy and Rose we can say our Daisy Rose and then daily they look almost similar why because the MS classification is maximum here others if we look at the last three of them which is Rose sunflower and develop the miss classification is happening too much so Coulomb by mistake is getting classified majorly as cue lip and Rose are getting confused okay so this is how you can read it good and if you want to look at that correct classifiers look at the diagonal elements they are the correct values of your classification all right good so this is one simple CNN example ready Ben see let’s try two so this is one way you can do transferred learning and they I showed you but that learning was not from any of the normal better networks it was our own network now let’s do let’s try to see one more dataset where we can do that I have a question on last same we are not doing padding that that’s what we mean by that padding is zero okay so what I did was I kept it very standard so I want you guys to do all the mix match and try to create your own No okay so now you you mean to say is that when I switch off my kernel energy start will I be able to get the same model back that’s that’s what you are trying to ask me right do I have to report the entire program or it’s just from the load model part just load model that’s it but if you are coding your weights onto your new model wherever you are loading we have to make sure that the weights that we had over here thus the this infrastructure that we created here should be same as this so if you’re new say new network should look something like this which would have at least these many layers to put the weights onto got it so it is like your machine line another thing in your machine learn model on to four variables now when your model is ready when you are predicting it if you are predicting on three variables what’s going to happen it’s going to crash why because it was not one more variable is missing so you should match this up that’s it but yes you are perfectly right you can do it on any of the other codes not only on this code you can import it somewhere else also all right only thing you have to make sure is you have to get the good correct part of the file or else what I usually do is I put it in the project that we have H file which I conversion of images into some so I just wanted to

understand why here we are because we are also saving here in hy part but the insert that will get the in what you say the coded part h5 file sometimes and sometimes images and based on that we have to take it into account because the way this was this data set was loaded is something in a different way compared to how we look ok so I think you you I think this is your second session or third so you will not terminate hdf basically what is this hdf format hdf format is it is a zipped file you can see in which will consist of lot of matrices okay now each matrix represents an image over here okay and each image has a target variable attached to it so in your previous project if you remember the target column was what what number it is and the the the sampled part of image was stored as a matrix this is what is hdf format now in our case what we are doing is when we are storing it is an h5 format or any other format what happens is it it tells me or it gives me some kind of mapping that this is say layer number one and these are the weights of layer number one this is layer number two and these are the weights of layer number so whenever I import this okay Python interprets this and directly starts loading these weights in our current infrastructure got it emit if you want to see in detail given some time I will make one specific case studies on how to what are the different varieties in which we can import and map the data you have to do that you can do that also we have zip format we have tar format we have got hdf format we have got HD f4 format we’ve got h5 format then we have got jpg and PNG format also multiple ways we can do it we down the line in one of the case studies even what we have done is we have done embedding that means we have directly imported the image and tagged it ourselves that’s our own format if you guys want to see one specific case studies on all the inputs we can do that also no problem let me know ok yes sure that will be great perfect all right so can we move on Amit is it is it good enough you’re clear about it one more I did not ask any of you project is submitted right by all of you I think even the score is out anybody wants to discuss the project solution do let me know will do it ok the previous project let me know I’ll dump the solution anyway but I’m not sure whether I’ve discussed the solution with you guys or not please remind me they might forget sometimes no coming ok we will do this autoencoders last now coming to this particular data set we will do one thing we don’t have enough time to go through the whole data set so we will we will revisit it’s a huge basically huge study so for now what I will show you guys is I will show you how to import findings and how to load it on the network ok so just think about it that there is a data set that I have which has different breeds of dogs ok and they have the photographs and impedance inside of these breeds the same way we had flowers now we have got different breeds I think we have around nineteen or twenty odd unique breeds available here now what are we doing is we are building our own convolution neural network as usual my normal pattern of building it so I’m starting with some numbers say 32 5 cross 5 ok and my input size is exactly the same as the size of my images which is 128 cross 128 cross 3 alright next say I’m having the filter size of 64 cross 64 sorry total number of 64 filters for cross for each cone size yeah and I’m keeping the standard function is same so I’ve got 1 2 3 4 layers and after that I am building my dense layer very simple and the total number of unique outputs we have is 120 I’m sorry I said 90 but it’s totally unique breeds are 120 alright so this is this is one example of a CNN that we have building now when I run my CNN down the line let me show you so when we run the CNN yeah so when I run my scene and say for example for number of some number of epochs say for one

epochs and each epoch I have taken a step of eight one seven seven that means I am running eight one seven seven epochs as of such when I do that look at my validation accuracy and normal accuracy I’m getting 0.08 that 0.8% more accuracy and validation is 0.01 that is 1.1 percent of accuracy under fit model definitely yeah why we got this because of whatever stunt we did over here by creating our and CNN all right now what if we are stuck in this situation we don’t have time to play around with multiple stuff over here so in that case what we can do is we have got standard vgg 16 weights only thing is there is a file which we have to store it either you can download it yeah from here you can give the reference over here or I can put it in my local directory and give a reference like this whatever you wish so I’m saying this is my new base model this is my new model did I call base and these are nothing but VG is 16 weights and here you define the location of your h5 file from where you are getting them if you observe the format of this file and if you observe the format of this file they are exactly same so basically this format is used to store for some next thing what I get is I what do you say we initially like put my X&Y again so basically I reinitialize my target column and training collar now in this only one thing I wanted to show you guys something newest TQ DM now TQ DM is a method of generating this status bar if you observe what is the current status of if you want to because sometimes when the epochs are on II know we are not sure how much time it will take enough so definitely we have a timer here but yes if you want the whole stack P qdm will help you to do that all right now next thing is what are we doing is we are getting all the images one by one and once we get the images we are passing the image is through our base model what is the base model base model is a model that we are defined by building our VG so I am I am indirectly telling them I don’t want to take this through my neural network my FCN or my normal network where I am doing forward and backward for now you pass on my inputs through my move it’s as simple as that alright and once we do that we are building our fully connected neural network the back end part of it the final part of it and then I am compiling it and finally we are running it did you get it how I froze the layers because if you remember we are not supposed to back propagate this so if I would have put the bits here along with our model dot Fred what’s going to happen it’s going to do back propagation and change all of over I don’t want to do that so first itself I am running my inputs through our weights I’m multiplying them so it is nothing but your input is there this is your input and these are your weights you are directly multiplying it and making it as one particular matrix now you use this as your extreme basically onto your final model got it and when you run the model if you observe the last accuracy we got is around ninety eight point three nine percent so it’s a better model than the one ever are we clear guys any confusion on to this the freezing part of it and freezing key so all multiplying got current input with weights exactly so what I’m doing is in hurry I’m having my images or my X training data I am saying it whatever weights I have got no multiply with the inputs and make it a bigger matrix and now use this as your training data with your F CNN to do front and back propagation now even if back propagation happens do you think it will change anything for training data nothing it will change everything when you ask CNN because it will not get it that this is a neural and whatever weights weights are nothing but some kind of matrix very simple all right it takes some time so it takes a week’s time for you guys to get normal to this way of coding it yeah so give some time try exploring this from now I will deliver both the codes to you what do you people do is in this particular the earlier code that I have shown you know instead of using my model you use this vgd 16 more and see if you are able to improvise okay and this from here down the line it’s a universal core it will work for any man no problem at all good everyone clear let me go through Harish

a cache of meat Druce Kaushik grantee Raghu sort of shifty are you all clear on this yeah this one you have imported to OpenCV yeah so yeah that will come to it so the the way we are getting the images will come to that that’s what I’m saying we’ll do this we’ll redo this case study in detail okay so it’s supposed to be a mix between week in a week 2 and week 3 actually be very heavy if I go through it right away okay we do this but I just wanted to give you the idea of transferred learning this is how we do transferred logic majorly I don’t waste my time doing this I’ll directly pull out the data the only drawback of transferred learning is first thing is you have you have to I will say completely dependent on their weights tomorrow if they remove the stuff from online or their support we are helpless the model will go down and second issue else that if the weight is not trained on that particular set of images let us say you want to build a neural network on space images we should have some network pre trained on that right if it is not trained on that then will not be able to use it that’s only two issues otherwise everything works ok perfect yeah so these are the two will redo this I will we will go through again and if possible I will we’ll do live coding so that we get hands on to the complete coding part yeah and also I want to give you the complete flow so what is the standard flow offer CN n okay what do we do what should we do when on this part okay this is one thing now let’s come do something onto Auto encoding so if you remember that dimensional reduction technique that I’ll stop talking about a very simple code where we have the same M nest data handwritten data and this is the original image and this is the under sampled image what I will say the dimensionally reduced it’s good how do we do that get the libraries get the data reshape the data normalize the data decide that from this level of input how much you want to come down to so say I want to come down to 64 depends on you how much you want to reduce the dimension to okay so once I come down to that you do all the resizing or reshape anything on the image part of it next we have to call our model so if you observe here what do we have we have got an input so I just explained you guys what is on the auto encoder this is your real input then you will have some encoding part of it and then you will have some decoding this is encoding this is decoding and this is your out okay so this is my input this is my encoding list is my decoding layers and this is my auto input on this so if you observe here if I am not writing multiple s and this writing normal dense less and other things I think are self-explanatory do you guys alright so this is another alternate way of creating a chaos thence model four lines enough after that what are we doing is we are calling our back prop and fitting the model now while fitting the model what are we doing we are increasing the batch size we are having some epochs yes in shuffle is equal to do shuffle in the sense if we if you want you can do some augmentation part of it and test and train split we are doing here we are saying 80/20 split basically okay so validation is nothing but our testing data so when I run it true and if you look at the accuracy part I get around okay there is no accuracy here I am really sorry because I did not we did not put the metric over here as accuracy I just wanted to see the last part how much loss we are having an issue observe it the last percentage is not that great between both of them all right now how we see this image how do we observe whatever we are done is good or bad so in that case we’ll call the auto encoder or predict on to our testing data and then finally we’ll plot the original data versus the testing data so the only showstopper here is I am Not sure that’s it you are supposed to know this okay how a matrix gets converted to an image I am Not sure and simple is that and what shape and size you want even you can specify so here I have got 28 cross 28 this is the original image this is the dimensionally reduced image now if we are not happy with this so let us say if you are making a implementation where when a security guard let us say runs the device through some of the costs they will get the number plate automatically or by looking at the logo they will get the make of the car or by looking at the car they’ll

get that color of anything here you don’t need a HD image basically you don’t have to compute or spend a lot of time and money on Hardware under this reduce the image and then computer please this could be a very simple application okay so this is your auto encoder good there is one more dimension reduction technique I am not sure if you guys have heard about it have you heard about t-sne no part of supervised learning no yes or no no right okay let me know if you guys are interested I will show you something on this also got up our course anyway but it is I feel it’s a little important sometimes so let me show you first of all why we need t-sne and we’re in all in deep learning you will need something of the tracker not not not as a part of our course this is what we do in industry I don’t want you guys to miss it see I have a module number yes got it okay so it is a part of NLP so I show something showed something extra to some of my matches now if you look at computer vision we saw today what all transferred learning can be done that is vgg image net electret linear not same thing you can do it in your NLP also where we already have some kind of encoding available ready-made so the first encoding that we can see here is glove another one could be could you say Elmo another one could be word – wait so what – it is developed by Google so these are standard embeddings that we have we directly imported and start using it this is what we mean by transfer toning so again it’s an example so what if I say that in our text so what is this case study about us we have got a huge text available with us now I want to know that let us say I want to predict something so I will say the Prince today’s prince will be tomorrow’s question mark okay so what what we can do out of this I will say today today is Prince is tomorrow’s – what is it so obviously you guys will know a prince is nothing but tomorrow he’ll be a king what if I remove the Prince and said today’s princess or today’s king queen will be tomorrow’s what we don’t know so this is used when you want to do some kind of prediction now they the best example I will give you is LinkedIn so when you are messaging on LinkedIn you guys have observed now so for example if Irish sends mineus message on LinkedIn something so it will give me some kind of tokens if you are seeing it it will first say hi Harish good morning Irish sure depending on the data or what is written on that if you also predict me sure we’ll meet tomorrow or sure I will do that or something how do we get these predictions we use something called RNA okay we use RNN to do this now what is RNN into a different network or recurrent neural network we’ll do in an LP anyway but RNN Otto uses or we can sometimes Otto use glove glove is a the way you did vgg and all in CV in NLP we can say ready-made Network so this is called global vector space and it is developed by Stanford in University now what in this what I will do is I will show you what if I want to predict what is the nearest encoding to King so from our corpus if you remember what is corpus I’ve showed you over chatbot corpus is a set of huge data available to train your model ok now what if I say give me the top 5 words 1 to 6 means 5 words which are very near to thing so if you observe queen monarch Prince Kingdom and rain from our corpus is able to find these are the five predictions which one you want me to since I will say share the top 3 predictions always something like a recommendation system I can say for text alright now I do why am i showing you this because here if I want to represent this what if this this this is all back-end I’m sure you base wouldn’t have understood what is the data what is it what is he talking about if I want to represent this what if I represent my tokens from my corpus are you able to visualize this not very great but you are able to see that yes there are some words which are tightly packed that means they are very near to each other

there are some words which are almost away but yes they are nearer so if you can observe University in college ok screwed an education hospital and health you can observe that these words are tagged together now if you ask me how let’s again a different topic how I tagged them together but glove will help me to do it ok so how I represented this how did I convert a higher dimension image to a 2d image like this we use PS any over there so let me go to t-sne what is T is any a very simplistic way it is supervised learning dimensional reduction technique so far we have done unsupervised learning dimension reduction technique that is PCA right in this case what we’ll do is we will say there is a 2d data and we have got four points say 1 2 3 4 I will say this is point number a this is point number B point number C point number D now I will say the distance between ok let me not give the answer now I will ask you guys what are the similarity measures that you have seen so far how I can okay in this case let me let me say that I’m saying distance could be one of the similarity matrices so I will say for our convenience I will say each one of them is at a usual distance from each other and the diagonal elements will be a distance of square root of 2 from each other agreed Pythagoras theorem yep now how would I reduce the dimension let me ask you I want to reduce the dimension to 1b how should I do it so to do that okay let me not ask you let me show you something will ask you to do that what I will do is I will pick up a point of reference let us say I’ll start with a I will put a over here now next to a what do we have we have B oh sorry we have be here at a unit distance and we have D at a unit distance so now if I say the distance between a D and B so now we are kind of close so if I keep distance in my mind and if I replicate them on a 1d space I can transform them like this now please remember how am i transport I’m transporting I’m saying that I will make sure that I don’t change the distances and right now the similarity matrix between all of them is distances so I have maintained distances between a B and D agreed everybody agrees to that now what about C where should I place my see what if I place it here because if you remember the distance between DNC is one at and the distance between a and C is also square root of two so if I place it here if this is C what is going to happen B and C is going to be far away Wireless BNC is going is kind of one itself here so what if I play C here now it’s satisfied but now D will be in issue so a is anywhere satisfied with both of them but D and B are creating some issue for C so this is a problem if we take this configuration okay this brings us to the end of this computer vision tutorial now before you guys sign off I’d like to inform you folks that we have launched a completely free learning platform called great Learning Academy we have access to free courses such as a iCloud and digital marketing so guys thank you very much for attending this session and have a great learning you