Welcome back Parimi’s Nation. Last time we have learnt about Secret Sharing and Artificial Intelligence. Now let’s dive into Machine Learning.
Today we shall learn about what is machine learning, how are the algorithms classified, understanding the supervised and unsupervised algorithms and mind map of the available algorithms.
What is Machine Learning?
In order to understand this, we need to know what Artificial intelligence is.
Although most of the people see AI as a science fiction where robots have taken over the world and enslaved us.
However, it doesn’t have to be like that. Think of it as putting a more human face on technology. It’s a technology that enables people to accomplish more by collaborating with smart software.
It’s a technology that can understand our kind of knowledge, learn from the vast amount of data that is available in the modern world.
Now that, we have understood what AI actually is, let’s begin with machine learning. The question is how do we build an AI system?
Machine learning provides the foundation for AI. So what is it?
Machine learning provides the foundation for artificial intelligence.
So what is it?
Well, as the name suggests, machine learning is a technique in which we train a software model using data. The model learns from the training cases and then we can use the trained model to make predictions for new data cases. The key to this is to understand that fundamentally, computers are very good at one thing, performing calculations.
To have a computer make intelligent predictions from the data, we just need a way to train it to perform the correct calculations.
We start with a data set that contains historical records, often called cases or observations. And each observation includes numeric features that quantify a characteristic of the item we’re working with. Let’s call that X. In general, we also have some value that we’re trying to predict, which we’ll call Y. And we use our training cases to train a machine learning model so that it can calculate a value for Y from the features in X. So in very simplistic terms, we’re creating a function that operates on a set of features, X, to produce predictions, Y.
Two broad kinds of machine learning supervised and unsupervised.
In supervised learning scenarios, we start with the observations that include known values for the variable we want to predict. And we call these labels. Now, because we started with data that includes the label we’re trying to predict, we can train the model using only some of the data and withhold the rest of the data for evaluating model performance.
We then use a machine learning algorithm to train a model that fits the features to the known label. And because we started with the known label value, we can validate the model by comparing the value predicted by the function, to the actual label value that we knew.
Then, when we’re happy that the model works, we can use it with new observations for which the label is unknown, and generate new predicted values.
Unsupervised learning is different from supervised learning, in that this time we don’t have known label values in the training dataset. We train the model by finding similarities between the observations. After the model is trained, each new observation is assigned to the cluster of observations with the most similar characteristics.
Supervised Learning techniques:
When we need to predict a numeric value, like an amount of money or a temperature or the number of calories, then what we use is a supervised learning technique called regression. We need our algorithm to learn the function that operates on particular task features to give us a result. Now, of course, a sample of only one isn’t likely to give us a function that generalizes well. So, what we do, is gather the same sort of data from lots of diverse participants and train our model based on this larger set of data. After we’ve trained the model and we have a generalized function that can be used to calculate our label Y, we can then plot the values of Y, calculated for specific features of X values on a chart like this.
And of course, we can interpolate any new values of X to predict an unknown Y. Now because we started with data that includes the label we are trying to predict, we can train the model using only some of the data. And withhold the rest of the data for evaluating model performance. Then we can use the model to predict f(x) for evaluation data. And compare the predictions or scored labels to the actual labels that we know to be true. The difference between the predicted and actual levels are what we call the residuals.
And they can tell us something about the level of error in the model. Now there are a few ways we can measure the error in the model, and these include root-mean-square error or RMSE and mean absolute error. Now both of these are absolute measures of error in the model. Of course, absolute values can vary wildly depending on what you’re predicting. So you might want to evaluate the model using relative metrics to indicate a more general level of error as a relative value between 0 and 1. Relative absolute error and relative squared error produce a metric where the closer to 0 the error, the better the model. And the coefficient of determination, which we sometimes call R squared, is another relative metric, but this time a value closer to 1 indicates a good fit for the model.
Now it’s time to look at another kind of supervised learning, classification. Classification is a technique that we can use to predict which class, or category, something belongs to. The simplest variant of this is binary classification where we predict whether an entity belongs to one of two classes. It’s often used to determine if something is true or false about the entity. More generally, a binary classifier is a function that can be applied to features X to produce a Y value of 1 or 0.
Now, the function won’t actually calculate an absolute value of 1 or 0, it will calculate a value between 1 and 0, and we’ll use a threshold value to decide whether the result should be counted as a 1 or a 0. When you use the model to predict values, the resulting value is classed as a 1 or a 0 depending on which side of the threshold line it falls. Because classification is a supervised learning technique, we withhold some of the test data to validate the model using known labels.
Cases where the model predicts a 1 for a test observation that actually has a value, a label value of 1, are considered true positives, and similar cases where the model predicts 0, and the actual label is 0, are true negatives. If the model predicts 1, but the actual label is 0, well that’s a false positive, and if the model predicts 0, but the actual value is 1, well that’s a false negative. Now the choice of threshold determines how predictions are assigned to classes. In some cases, a predicted value might be very close to the threshold but is still misclassified. You can move the threshold to control how the predicted values are classified.
The kinds of these are often shown in what’s called a confusion matrix, and this provides the basis for calculating performance metrics for the classifier. The simplest metric is accuracy, which is just the number of correctly classified cases divided by the total number of cases. Consider the case, there are five true positives and four true negatives, and there are also two false positives and no false negatives. That gives us nine correct predictions out of a total of 11, which is an accuracy of 0.82, or 82%.
Unsupervised learning Techniques:
Now unsupervised learning techniques, you don’t have a known label with which to train the model. But you can still use an algorithm that finds similarities in data observations in order to group them into clusters. We can categorize them based on similar characteristics.
There are a number of ways we can create a clustering model.
One of the most popular clustering techniques is k-means clustering. Now the key to understanding k-means is to remember that our data consists of rows of data, and each row has multiple features.
Now if we assume that each feature is a numeric value, then we can plot them as coordinates. Multiple features would be plotted in n-dimensional space. We then decide how many clusters we want to create, which we call k. And we plot k points at random locations that represent the centre points of our clusters. Suppose that k is three, so we’re creating three clusters.
Next, we identify which of the three centroids each point is closest to, and assign the points to clusters accordingly. Then we move each centroid to the true centre of the points and its cluster. And reallocate the points in the cluster based on their nearest centroid. And we just repeat that process until we have nicely separated clusters.
So what do I mean by nicely separated? Well, we want a set of clusters that separate data to the greatest extent possible. To measure this, we can compare the average distance between the cluster centres. And the average distance between the points in the cluster and their centres.
Clusters that maximize this ratio have the greatest separation. We can also use the ratio of the average distance between clusters, and the maximum distance between the points and the centroid of the cluster.
Well here is a quick mind map of all the machine learning algorithms:
Traffic sign recognition using TensorFlow:
Tensorflow is an open source library for machine learning applications. It is through which you develop a model as a data flow graph.
This is about building a model that can identify the street signs.
- Python 3.5
- TensorFlow 0.12
Finding the Training data:
Belgian Traffic sign data set is available on the internet and it is free to download.
Exploring the Dataset
Knowing the data well will save a lot of time further.The images are in the .ppm format.Most tools don’t support it but luckily Scikit image library identifies it.
Observe that the aspect ratios of images are not same. So if the model takes only images of fixed sizes then there is a need for preprocessing. All the images are converted into fixed sized images.
Minimum viable model:
There are 62 neurons and each neuron takes the RGB value of all pixels as input. This is a fully connected layer because each neuron connects to every input value.
The job of tensorflow is to encapsulate the model into data flow graph. It consists of operations like Add, Multiplies, Reshapes etc. These operations are performed using multidimensional arrays.
We use the familiar equation Y=aX+b and calculate the outputs.The outputs of the fully connected layer is a logit which in the inverse of a logistic function.
Now residuals are calculated using cross-entropy as it is used for classification tasks. After that, best optimizer algorithm is chosen, the model is trained and loss is reduced to produce an accurate result. The final result appears to be
For a complete walkthrough just visit Walled Abdulla’s post on it.
However, machine learning is one the ways to achieve artificial intelligence. Machine learning offers solutions to complicated tasks and enables the system to deal with large-scale domains.
These techniques are widely used to solve real-world problems by storing, manipulating, extracting and retrieving data from large sources. Supervised machine learning techniques have been widely adopted however these techniques prove to be very expensive when the systems are implemented over a wide range of data. This is due to the fact that significant amount of effort and cost is involved because of obtaining large labelled data sets. Thus, active learning provides a way to reduce the labelling costs by labelling only the most useful instances for learning.
A research article on “Machine Learning” made possible with the guidance and help of,
Head of Department(Information Technology)
SreeNidhi Institute of Science and Technology,
Researched and Published by,
Hemanth Bandari, IT-F4, 15311A12K1, SNIST
*Not for reproduction