4. Logistic Regression

Logistic Regression is one of the best machine learning algorithms for binary classification problems. It is mainly focused on calculating the probability of an event occurring based on the previous data provided

These machine learning algorithms are a statistical method used to estimate discrete values from a set of independent variables. It helps to predict the probability of an event by fitting data to a logit function and allows one to say that the presence of a risk factor increases the probability of a given outcome by a specific percentage.

In logistic regression, the output is in the form of probabilities of the default class (unlike linear regression, where the output is directly produced). As it is a probability, the output lies in the range of 0-1.

The output (y-value) is generated by log-transforming the \(x\) value, using the logistic function \(log\,h(x)=\frac{1}{1+e^{-x}}\). A threshold is then applied to force this probability into a binary classification.

The logistic regression model computes a weighted sum of the input variables similar to the linear regression, but it runs the result through a special non-linear function, the logistic function or sigmoid function to produce the output \(y\). Here, the output is binary or in the form of 0/1 or -1/1.

Source: quantinsti.com

The sigmoid/logistic function is given by the following equation: \(y=\frac{1}{1+e^{-x}}\)

As you can see in the graph, it is an S-shaped curve that gets closer to 1 as the value of input variable increases above 0 and gets closer to 0 as the input variable decreases below 0. The output of the sigmoid function is 0.5 when the input variable is 0.

Thus, if the output is more than 0.5, we can classify the outcome as 1 (or positive) and if it is less than 0.5, we can classify it as 0 (or negative).

The goal of logistic regression is to use the training data to find the values of coefficients such that it will minimize the error between the predicted outcome and the actual outcome. These coefficients are estimated using the technique of Maximum Likelihood Estimation.

Maximum Likelihood Estimation is a general approach to estimating parameters in statistical models. You can maximize the likelihood of using different methods like an optimization algorithm.

Newton’s Method is such an algorithm and can be used to find the maximum (or minimum) of many different functions, including the likelihood function. Instead of Newton’s Method, you could also use Gradient Descent.

Example: In predicting whether an event will occur or not, the event that it occurs is classified as 1. In predicting whether a person will be sick or not, the sick instances are denoted as 1). It is named after the transformation function used in it, called the logistic function h(x)= 1/ (1 + ex), which is an S-shaped curve.

In general, this machine learning algorithms can be used in real-world applications such as credit scoring, measuring the success rates of marketing campaigns, predicting the revenues of a certain product.

5. K-Nearest Neighbors

The KNN is a very simple and very effective machine learning algorithm. It is a non-parametric, lazy-learning algorithm, which means that there is no explicit training phase before classification.

The purpose behind its use is to use a database in which the data points are separated into several classes to predict the classification of a new sample point. The k-nearest neighbor’s algorithm uses the entire dataset as the training set, rather than splitting the dataset into a training set and test set.

KNN can require a lot of memory or space to store all of the data, but only performs a calculation (or learn) when a prediction is needed, just in time. You can also update and curate your training instances over time to keep predictions accurate.

In this above example, the K-Nearest Neighbor process dictates the new data point to belong in the red category

The K-Nearest-Neighbour algorithm estimates how likely a data point is to be a member of one group or another. It essentially looks at the data points around a single data point to determine what group it is actually in.

For example, if one point is on a grid and the algorithm is trying to determine what group that data point is in (Group A or Group B, for example) it would look at the data points near it to see what group the majority of the points are in.

In KNN machine learning algorithm the predictions are made for a new data set by searching through the entire training set for the K most similar instances, the neighbors and summarizing the output variable for those K instances.

There is a various application of KNN algorithms, K-NN is often used in search applications where you are looking for similar items; that is when your task is some form of find items similar to this one. You’d call this a k-NN search.

6. Learning Vector Quantization

In computer science, learning vector quantization (LVQ), is a supervised neural network that uses a competitive (winner-take-all) learning strategy.

It is related to other supervised neural networks such as the Perceptron and the Back-propagation algorithm. LVQ algorithm is an artificial neural network algorithm that allows you to choose how many training instances to hang onto and learns exactly what those instances should look like

It is also related to other competitive learning neural networks such as the Self-Organizing Map algorithm that is a similar algorithm for unsupervised learning with the addition of connections between the neurons.

Additionally, LVQ is a baseline technique that was defined with a few variants LVQ1, LVQ2, LVQ2.1, LVQ3, OLVQ1, and OLVQ3 as well as many third-party extensions and refinements too numerous to list.

The representation for LVQ is a collection of codebook vectors. These are selected randomly in the beginning and adapted to best summarize the training dataset over a number of iterations of the learning algorithm.

After learned, the codebook vectors can be used to make predictions. The most similar neighbor (best matching codebook vector) is found by calculating the distance between each codebook vector and the new data instance.

The class value or real value in the case of regression for the best matching unit is then returned as the prediction. Best results are achieved if you rescale your data to have the same range, such as between 0 and 1.

There are various applications of learning vector quantization such as localization of myocardial infarction, fault diagnosis of the power transformer and for the classification of breast lesions.


Suppose there are three classes { red, blue and green}.  The applet animation below shows how an LVQ with two neurons per color, is able to adjust the weight vectors of its neurons so that they become a typical red, blue and green reference or codebook vectors.  As in the previous example, the input vector x has only two elements, which can then be shown on a 2D plot.

If you discover that KNN gives good results on your dataset try using LVQ to reduce the memory requirements of storing the entire training dataset.

7. Support Vector Machines

Support vector machines are supervised machine learning algorithms and it is widely used in classification objectives.

The objective of the support vector machine algorithm is to find a hyperplane in N-dimensional space (N – the number of features) that distinctly classifies the data points.

In this algorithm, we plot each data item as a point in n-dimensional space with the value of each feature being the value of a particular coordinate. Then, we perform classification by finding the hyperplane that differentiates the two classes very well (look at the below snapshot).

In SVM, a hyperplane is selected to best separate the points in the input variable space by their class, either class 0 or class 1. The loss function that helps maximize the margin is hinge loss.

In two-dimensions, you can visualize this as a line and let’s assume that all of our input points can be completely separated by this line. The SVM learning algorithm finds the coefficients that result in the best separation of the classes by the hyperplane.

The basic concept behind support vector machines is of decision planes that define decision boundaries, a decision plane is one that separates between a set of objects having different class memberships.

If we talk about it’s about its pros then it is one of those accurate results giving machine algorithms. Support vector machines also work well on smaller cleaner datasets and it can be more efficient because it uses a subset of training points

And if we look at its cons then it is not suited to larger datasets as the training time and it is also less effective on noisier datasets with overlapping classes.

SVM machine learning algorithms are mostly used face detection, it classifies parts of the image as a face and non-face and creates a square boundary around the face. It is also used text and hypertext categorization, classification of images, bioinformatics, etc.

8. Apriori

The Apriori machine learning algorithm is an unsupervised algorithm used frequently to sort information into categories. The sorted information found very helpful with any data management process, it also ensures that data users are apprised of new information and can figure out the data that they are working with.

The Apriori algorithm basically generates associated rules from given data set and works with the “bottom-up” approach where frequently used subsets are extended one at a time and algorithm terminates when no further extension could be carried forward.

This machine-learning algorithm is used in a transactional database to mine frequent itemsets and then generate association rules.

It is popularly used in market basket analysis, where one checks for combinations of products that frequently co-occur in the database.

The Apriori algorithm fundamentally works on its two basic principles, first that if an itemset occurs frequently then all subset of itemset occurs frequently and the other is that if an itemset occurs infrequently then all superset has infrequently occurrences.

In mostly write the association rule for ‘if a person purchases item X, then he purchases item Y’ as X -> Y.

For example, if a person purchases milk and sugar, then he is likely to purchase coffee powder. This could be written in the form of an association rule as {milk, sugar} -> coffee powder. Association rules are generated after crossing the threshold for support and confidence.

The Support measure helps prune the number of candidate itemsets to be considered during frequent itemset generation. This support measure is guided by the Apriori principle.

The Apriori principle states that if an itemset is frequent, then all of its subsets must also be frequent.

The Apriori machine learning algorithm works by recognizing a particular characteristic of a data set and attempting to note how frequently that characteristic pop up throughout the set. The characteristics that are frequent can then be analyzed and place into pairs

This process helps to point out more relationships between relevant data points. Other forms of data can be pruned and placed into their own categories.

The definition of “frequent” is inherently relative and only makes sense in context.

Therefore, the idea is implemented in the Apriori algorithm through a pre-arranged amount determined by either the operator or the algorithm. A “frequent” data characteristic is one that occurs above that pre-arranged amount, known as support.

Analysis can detect more and more relations throughout the body of data until the algorithm has exhausted all of the possible.

Apriori helps the customers buy their items with ease, and enhances the sales performance of the departmental store.

This algorithm has utility in the field of healthcare as it can help in detecting adverse drug reactions (ADR) by producing association rules to indicate the combination of medications and patient characteristics that could lead to ADRs.

9. Boosting with AdaBoost

Boosting with AdaBoost is the boosting algorithm that is mostly used when there is a massive load of data that is needed to be handled in order to make predictions with high accuracy.

Boosting with AdaBoost machine learning algorithms are powerful, flexible and can be interpreted nicely with some tricks. It is an ensemble technique that attempts to create a strong classifier from a number of weak classifiers.

This is done by building a model from the training data, then creating a second model that attempts to correct the errors from the first model. Models are added until the training set is predicted perfectly or a maximum number of models are added.

In short, it combines multiple weak or average predictors to build strong predictors. These boosting algorithms always work well in data science competitions like Kaggle, AV Hackathon, CrowdAnalytix.

AdaBoost was the first really successful boosting algorithm developed for binary classification.

It is the best starting point for understanding boosting. Modern boosting methods build on AdaBoost, most notably stochastic gradient boosting machines.

Boosted algorithms are used where we have plenty of data to make a prediction. And we seek exceptionally high predictive power. It is used for reducing bias and variance in supervised learning.

10. Random Forest

We are now at the end of our tour of machine learning algorithms, and the last algorithm that we are going to see is Random Forest machine learning algorithms.

The Random Forest machine learning algorithm is easy to use and powerful algorithm and it also very flexible. It is a type of ensemble machine learning algorithm called Bootstrap Aggregation or bagging.

The Random Forest algorithm can use both for classification and the regression kind of problems. It mostly use where the decision trees are drawn in order to select optimal split points, suboptimal splits are made by introducing randomness.

As the name of the algorithm shown, this machine learning algorithm creates a forest and makes it somehow random.

The forest that it builds is an ensemble of Decision Trees as we previously talk and most of the time it is trained with the “bagging” method. The basic concept behind the bagging method is that a combination of learning models increases the overall result.

If you get good results with an algorithm with high variance (like decision trees), you can often get better results by bagging that algorithm.

For classifying a new object based on attributes, each tree gives a classification and we say the tree “votes” for that class. The forest chooses the classification having the most votes (over all the trees in the forest).

Each tree of the forest has planted and grown as follows if the number of cases in the training set is N, then the sample of N cases is taken at random but with replacement. This sample will be the training set for growing the tree.

Whereas if there are M input variables, then a number m<<M is specified such that at each node, m variables are selected at random out of the M and the best split on this m is used to split the node.

The value of m is held constant during the forest growing.

Each tree is grown to the largest extent possible. There is no pruning.

The random algorithm used in wide varieties applications, the industries that heavily use Random Forest algorithm is Banking, Medicine, Stock Market, E-commerce.

The advantage of Random Forest machine learning algorithms is that the overfitting problem will never come when we use it in any classification problem. Also, the same random forest algorithm can be used for both classification and regression task.


In the end, the only thing I want to say is that machine learning is a huge field and the above machine learning algorithms are only a few of them. The application and chooses of use of an algorithm mostly depend on what kind of project you are going on. Keep exploring keep learning and make this world a better place to live.

Source: Techgrabyte

Related posts: