In this article, I will explain a widely used method for segmenting handwritten documents into individual lines.
An important pre-processing step in any OCR tool or algorithm is to deskew the scanned document first.
Principal component analysis is a method used to reduce the number of dimensions in a dataset without losing much information.
K-Means is one of the simplest unsupervised clustering algorithm which is used to cluster our data into K number of clusters.
Support Vector machines (SVM) can be used for both classification as well as regression tasks but they are mostly used in classification applications.
Polynomial regression is a process of finding a polynomial function that takes the form f( x ) = c0 + c1 x + c2 x2 ⋯ cn xn where n is the degree of the polynomial and c is a set of coefficients.
Simple linear regression is a statistical method you can use to study relationships between two continuous (quantitative) variables: independent variable (x) – also referred to as predictor or explanatory variable dependant variable (y) – also referred to as response or outcome The goal of any regression model is to predict the value of y (dependant variable) based on the value of x (independent variable).
Like I always believed, there is a way to represent each and every natural language sentence using mathematical notations.