If you get on a plane that can carry 100 or more passengers at a time, there is a 99% chance that one of the passengers shares the same birthday as yours. If you get into a bus with a capacity of 50 passengers, you have a 97% chance of finding someone who shares your […]

# Category: Artificial Intelligence

### Understanding Correlations and Correlation Matrix

Correlation is the measure of how two or more variables are related to one another, also referred to as linear dependence. An increase in demand for a product increases its price, also called the demand curve, traffic on roads at certain intervals of time of the day, the amount of rain correlates with grass fires, […]

### Computing the discrete Fréchet distance using dynamic programming

Definition The Fréchet distance is usually explained using the analogy of a man walking his dog. A man is walking a dog on a leash: the man can move on one curve, the dog on the other; both may vary their speed, but backtracking is not allowed. What is the length of the shortest leash […]

### All Tesseract OCR options

This is for my reference and this might come in handy for others too. All Tesseract options CLI Examples Command Example Notes tesseract sample_images/image2.jpg stdout To print the output to standard output tesseract sample_images/image2.jpg sample_images/output By default the output will be named outbase.txt. tesseract sample_images/image2.jpg sample_images/output -l eng -l is for language. English is default […]

### Segmenting lines in handwritten documents using A* Path planning algorithm

In this article, I will explain a widely used method for segmenting handwritten documents into individual lines. Below is a sample output from my algorithm. The below flowchart outlines the different steps involved in the segmentation process. The explained method will only work with non-skewed documents. To de-skew the document, you can refer to my […]

### Deskewing scanned documents using horizontal projections

An important pre-processing step in any OCR tool or algorithm is to deskew the scanned document first. Take a look at the below sample scanned image, its tilted by a small angle. In this article I will explain a method to deskew these types of documents using their horizontal projections. The final outcome of deskewing […]

### Mathematics of Principal component analysis

Principal component analysis is a method used to reduce the number of dimensions in a dataset without losing much information. It’s used in many fields such as face recognition and image compression and is a common technique for finding patterns in data and also in the visualization of higher-dimensional data. PCA is all about geometrically […]

### K-Means on Iris Dataset

Read my previous post to understand how K-Means algorithm works. In this post I will try to run the K-Means on Iris dataset to classify our 3 classes of flowers, Iris setosa, Iris versicolor, Iris virginica (our classess) using the flowers sepal-length, sepal-width, petal-length and petal-width (our features) Getting data: describe the data: Converting the class […]

### Mathematics behind K-Mean Clustering algorithm

K-Means is one of the simplest unsupervised clustering algorithm which is used to cluster our data into K number of clusters. The algorithm iteratively assigns the data points to one of the K clusters based on how near the point is to the cluster centroid. The result of K-Means algorithm is: K number of cluster […]

### Understanding Support vector Machines using Python

Support Vector machines (SVM) can be used for both classification as well as regression tasks but they are mostly used in classification applications. Some of the real world applications include Face detection, Handwriting detection, Document categorisation, SPAM Filtering, image classification and protein remote homology detection. For many researchers, SVM is the first best choice for […]