{"id":735,"date":"2018-06-30T16:28:18","date_gmt":"2018-06-30T16:28:18","guid":{"rendered":"http:\/\/muthu.co\/?p=735"},"modified":"2021-05-24T02:23:50","modified_gmt":"2021-05-24T02:23:50","slug":"understanding-support-vector-machines-using-python","status":"publish","type":"post","link":"http:\/\/write.muthu.co\/understanding-support-vector-machines-using-python\/","title":{"rendered":"Understanding Support vector Machines using Python"},"content":{"rendered":"\n<p>Support Vector machines (SVM) can be used for both classification as well as regression tasks but they are mostly used in classification applications. Some of the real world applications include Face detection, Handwriting detection, Document categorisation, SPAM Filtering, image classification and protein remote homology detection. For many researchers, SVM is the first best choice for any classification task because of its efficiency in performing classification on linearly separable as well as non-linear datasets.<\/p>\n\n\n\n<p>Take a look at the below image, there are multiple lines dividing the two data sets. SVM helps us find the one marked B because its the widest divider between the datasets.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><a href=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/download.png\"><img loading=\"lazy\" decoding=\"async\" width=\"418\" height=\"292\" src=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/download.png\" alt=\"\" class=\"wp-image-741\" srcset=\"http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/download.png 418w, http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/download-300x210.png 300w\" sizes=\"auto, (max-width: 418px) 100vw, 418px\" \/><\/a><\/figure><\/div>\n\n\n\n<p>Support Vector machines are based on the concept of finding the best and the widest plane that divides a set of data. The advantage of using SVM over other classification algorithms like K-Means or Naive Bayes is that&nbsp;SVM finds the&nbsp;<a href=\"https:\/\/en.wikipedia.org\/wiki\/Decision_boundary\" target=\"_blank\" rel=\"nofollow noopener\">decision boundary<\/a> that maximizes the distance from the nearest data points of all the classes. An SVM doesn&#8217;t merely find a decision boundary; it finds the most optimal decision boundary. Take a look at the below illustration to understand it better:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><a href=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/Untitled.png\"><img loading=\"lazy\" decoding=\"async\" width=\"440\" height=\"377\" src=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/Untitled.png\" alt=\"\" class=\"wp-image-737\" srcset=\"http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/Untitled.png 440w, http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/Untitled-300x257.png 300w\" sizes=\"auto, (max-width: 440px) 100vw, 440px\" \/><\/a><\/figure><\/div>\n\n\n\n<p>In the above illustration, you can see two set of dots, blue and black divided by a single&nbsp; line called the Hyperplane. The margin is the widest road separating the two sets. As marked in the diagram there are few data points which touch the highest margin line, these are called our support vectors (the reason why this algorithm is called Support Vector Machine).<\/p>\n\n\n\n<p>But not all datasets can be divided by a straight line. Take a look at the below dataset:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><a href=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/nonlineaddivide.png\"><img loading=\"lazy\" decoding=\"async\" width=\"440\" height=\"377\" src=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/nonlineaddivide.png\" alt=\"\" class=\"wp-image-738\" srcset=\"http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/nonlineaddivide.png 440w, http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/nonlineaddivide-300x257.png 300w\" sizes=\"auto, (max-width: 440px) 100vw, 440px\" \/><\/a><\/figure><\/div>\n\n\n\n<p>This is the type we call a non-linear dataset which we usually get for our real world applications. We have to transform these into a form which can be linearly separated. We do that by using functions called kernels.<\/p>\n\n\n\n<p>Transforming Linear data using Linear Kernel:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><a href=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/nonlineaddivide-transformed.png\"><img loading=\"lazy\" decoding=\"async\" width=\"695\" height=\"241\" src=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/nonlineaddivide-transformed.png\" alt=\"\" class=\"wp-image-739\" srcset=\"http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/nonlineaddivide-transformed.png 695w, http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/nonlineaddivide-transformed-300x104.png 300w\" sizes=\"auto, (max-width: 695px) 100vw, 695px\" \/><\/a><\/figure><\/div>\n\n\n\n<p>Transforming data using higher order polynomial or gaussian kernel:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><a href=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/Untitled-1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1566\" height=\"846\" src=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/Untitled-1.png\" alt=\"\" class=\"wp-image-764\" srcset=\"http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/Untitled-1.png 1566w, http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/Untitled-1-300x162.png 300w, http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/Untitled-1-768x415.png 768w, http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/Untitled-1-1024x553.png 1024w, http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/Untitled-1-700x378.png 700w\" sizes=\"auto, (max-width: 1566px) 100vw, 1566px\" \/><\/a><\/figure><\/div>\n\n\n\n<p>Maths behind finding the hyperplane has been nicely explained <a href=\"https:\/\/www.svm-tutorial.com\/2014\/11\/svm-understanding-math-part-1\/\">here<\/a>, so I will leave the mathematics part and jump right to the implementation.<\/p>\n\n\n\n<p>With the basics in place, first lets try SVM on a random dataset which is linearly separable and understand the various parts of this machine learning algorithm.<\/p>\n\n\n\n<pre class=\"wp-block-code EnlighterJSRAW\"><code># importing scikit learn with make_blobs\nfrom sklearn.datasets.samples_generator import make_blobs\nimport numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\n\n# Y containing two classes\nX, y = make_blobs(n_samples=50, centers=2, random_state=0, cluster_std=0.40)\n# plotting our dataset \nplt.scatter(X&#91;:, 0], X&#91;:, 1], c=Y, s=50, cmap='spring');\ndf = pd.DataFrame(dict(feature1=X&#91;:,0], feature2=X&#91;:,1], category=y))\n<\/code><\/pre>\n\n\n\n<p>Our dataset generated from the above code looks like:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><a href=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/Snip20180630_90.png\"><img loading=\"lazy\" decoding=\"async\" width=\"321\" height=\"145\" src=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/Snip20180630_90.png\" alt=\"\" class=\"wp-image-758\" srcset=\"http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/Snip20180630_90.png 321w, http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/Snip20180630_90-300x136.png 300w\" sizes=\"auto, (max-width: 321px) 100vw, 321px\" \/><\/a><\/figure><\/div>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><a href=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/download-2.png\"><img loading=\"lazy\" decoding=\"async\" width=\"366\" height=\"252\" src=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/download-2.png\" alt=\"\" class=\"wp-image-759\" srcset=\"http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/download-2.png 366w, http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/download-2-300x207.png 300w\" sizes=\"auto, (max-width: 366px) 100vw, 366px\" \/><\/a><\/figure><\/div>\n\n\n\n<p>Here &#8220;category&#8221; is the two groups represented by 0 and 1 for our feature1 and feature2 combination. When plotted we get the below figure, our job using SVM is to find a plane which divides these two datasets.Feeding the above dataset into our SVM :<\/p>\n\n\n\n<pre class=\"wp-block-code EnlighterJSRAW\"><code>#split my data set into 80% training and 20% test data\nfrom sklearn.model_selection import train_test_split  \nX_train, X_test, y_train, y_test = train_test_split(df&#91;&#91;'feature1','feature2']], df&#91;'category'], test_size = 0.20)\n\n# #the SVM classification of dataset\nfrom sklearn.svm import SVC  \nsvclassifier = SVC(kernel='linear')  \nsvclassifier.fit(X_train, y_train)\ny_pred = svclassifier.predict(X_test)<\/code><\/pre>\n\n\n\n<p>See how simple it is to import SVM algorithm from sklearn and build our prediction modal. Lets evaluate our modal using the classification report:<\/p>\n\n\n\n<pre class=\"wp-block-code EnlighterJSRAW\"><code>from sklearn.metrics import classification_report\n#evaluating our model for prediction accuracy\nprint(classification_report(y_test, y_pred))<\/code><\/pre>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><a href=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/Snip20180630_91.png\"><img loading=\"lazy\" decoding=\"async\" width=\"450\" height=\"105\" src=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/Snip20180630_91.png\" alt=\"\" class=\"wp-image-760\" srcset=\"http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/Snip20180630_91.png 450w, http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/Snip20180630_91-300x70.png 300w\" sizes=\"auto, (max-width: 450px) 100vw, 450px\" \/><\/a><\/figure><\/div>\n\n\n\n<p>The precision is pretty good. Its 100% accurate for our test data. Now, lets find our hyperplane line and the support vectors and plot them too.<\/p>\n\n\n\n<pre class=\"wp-block-code EnlighterJSRAW\"><code>w = svclassifier.coef_&#91;0]\na = -w&#91;0] \/ w&#91;1]\nxx = np.linspace(-5, 5)\nyy = a * xx - (svclassifier.intercept_&#91;0]) \/ w&#91;1]\n\nb = svclassifier.support_vectors_&#91;0]\nyy_down = a * xx + (b&#91;1] - a * b&#91;0])\nb = svclassifier.support_vectors_&#91;-1]\nyy_up = a * xx + (b&#91;1] - a * b&#91;0])\n\n# plot the line, the points, and the nearest vectors to the plane\nplt.plot(xx, yy, 'k-')\nplt.plot(xx, yy_down, 'k--')\nplt.plot(xx, yy_up, 'k--')\n\nplt.scatter(svclassifier.support_vectors_&#91;:, 0], svclassifier.support_vectors_&#91;:, 1],\n            s=80, facecolors='none')<\/code><\/pre>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><a href=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/Snip20180630_92.png\"><img loading=\"lazy\" decoding=\"async\" width=\"610\" height=\"459\" src=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/Snip20180630_92.png\" alt=\"\" class=\"wp-image-761\" srcset=\"http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/Snip20180630_92.png 610w, http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/Snip20180630_92-300x226.png 300w\" sizes=\"auto, (max-width: 610px) 100vw, 610px\" \/><\/a><\/figure><\/div>\n\n\n\n<p>In the above plot, you can see the data set being divided by the most optimal line called the hyperplane and also the support vectors touching the decision boundaries.<\/p>\n\n\n\n<p>Now, lets try to implement SVM on a real world dataset. For our analysis we will use the famous <a href=\"https:\/\/archive.ics.uci.edu\/ml\/support\/iris\">Iris dataset<\/a> which consists of 50 samples from each of three species of&nbsp;<i>Iris<\/i>&nbsp;(<i><a title=\"Iris setosa\" href=\"https:\/\/en.wikipedia.org\/wiki\/Iris_setosa\">Iris setosa<\/a><\/i>,&nbsp;<i><a title=\"Iris virginica\" href=\"https:\/\/en.wikipedia.org\/wiki\/Iris_virginica\">Iris virginica<\/a><\/i>&nbsp;and&nbsp;<i><a title=\"Iris versicolor\" href=\"https:\/\/en.wikipedia.org\/wiki\/Iris_versicolor\">Iris versicolor<\/a><\/i>). Four&nbsp;<a class=\"mw-redirect\" title=\"Features (pattern recognition)\" href=\"https:\/\/en.wikipedia.org\/wiki\/Features_(pattern_recognition)\">features<\/a>&nbsp;were measured from each sample: the length and the width of the&nbsp;<a title=\"Sepal\" href=\"https:\/\/en.wikipedia.org\/wiki\/Sepal\">sepals<\/a>&nbsp;and&nbsp;<a title=\"Petal\" href=\"https:\/\/en.wikipedia.org\/wiki\/Petal\">petals<\/a>, in centimetres. Based on the combination of these four features we will be building an SVM model to distinguish the species from each other.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><a href=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/iris-flowers.png\"><img loading=\"lazy\" decoding=\"async\" width=\"608\" height=\"188\" src=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/iris-flowers.png\" alt=\"\" class=\"wp-image-754\" srcset=\"http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/iris-flowers.png 608w, http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/iris-flowers-300x93.png 300w\" sizes=\"auto, (max-width: 608px) 100vw, 608px\" \/><\/a><\/figure><\/div>\n\n\n\n<p>The dataset looks like:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><a href=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/Snip20180630_96.png\"><img loading=\"lazy\" decoding=\"async\" width=\"451\" height=\"406\" src=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/Snip20180630_96.png\" alt=\"\" class=\"wp-image-765\" srcset=\"http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/Snip20180630_96.png 451w, http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/Snip20180630_96-300x270.png 300w\" sizes=\"auto, (max-width: 451px) 100vw, 451px\" \/><\/a><\/figure><\/div>\n\n\n\n<p><strong>First attempt : Polynomial Kernel &#8211; Degree 8<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code EnlighterJSRAW\"><code>import numpy as np  \nimport matplotlib.pyplot as plt  \nimport pandas as pd\n\nurl = \"https:\/\/archive.ics.uci.edu\/ml\/machine-learning-databases\/iris\/iris.data\"\n\n#the imported dataset does not have the required column names so lets add it\ncolnames = &#91;'sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']\nirisdata = pd.read_csv(url, names=colnames)\nX = irisdata.drop('Class', axis=1) #x contains all the features\ny = irisdata&#91;'Class'] #contains the categories\n\n#split my data set into 80% training and 20% test data\nfrom sklearn.model_selection import train_test_split  \nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)\n\n#the SVC algorithm work\nfrom sklearn.svm import SVC  \nfrom sklearn.metrics import classification_report, confusion_matrix\nsvclassifier = SVC(kernel='poly', degree=8)  \nsvclassifier.fit(X_train, y_train)\ny_pred = svclassifier.predict(X_test)\n\n#evaluating our model for prediction accuracy\ndf_cm = confusion_matrix(y_test, y_pred)\nprint(classification_report(y_test, y_pred))\n\n#seaborn is used for a better looking confusion matrix\nimport seaborn as sn\nsn.heatmap(df_cm, annot=True,annot_kws={\"size\": 16})<\/code><\/pre>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><a href=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/Snip20180630_87.png\"><img loading=\"lazy\" decoding=\"async\" width=\"489\" height=\"132\" src=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/Snip20180630_87.png\" alt=\"\" class=\"wp-image-755\" srcset=\"http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/Snip20180630_87.png 489w, http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/Snip20180630_87-300x81.png 300w\" sizes=\"auto, (max-width: 489px) 100vw, 489px\" \/><\/a><\/figure><\/div>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><a href=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/download-1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"356\" height=\"252\" src=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/download-1.png\" alt=\"\" class=\"wp-image-756\" srcset=\"http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/download-1.png 356w, http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/download-1-300x212.png 300w\" sizes=\"auto, (max-width: 356px) 100vw, 356px\" \/><\/a><\/figure><\/div>\n\n\n\n<p>A polynomial kernel is about 87% efficient in classifying the correct flower.<\/p>\n\n\n\n<p><strong>Second Attempt: Gaussian Kernel<\/strong><\/p>\n\n\n\n<p>There isn&#8217;t much to change in our program except changing the parameter kernel to &#8216;rbf&#8217;.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted EnlighterJSRAW\">svclassifier = SVC(kernel='rbf')<\/pre>\n\n\n\n<p>Our classification report now looks like :<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><a href=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/Snip20180630_93.png\"><img loading=\"lazy\" decoding=\"async\" width=\"501\" height=\"138\" src=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/Snip20180630_93.png\" alt=\"\" class=\"wp-image-762\" srcset=\"http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/Snip20180630_93.png 501w, http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/Snip20180630_93-300x83.png 300w\" sizes=\"auto, (max-width: 501px) 100vw, 501px\" \/><\/a><\/figure><\/div>\n\n\n\n<p>Confusion matrix:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><a href=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/download-3.png\"><img loading=\"lazy\" decoding=\"async\" width=\"346\" height=\"252\" src=\"https:\/\/muthu.co\/wp-content\/uploads\/2018\/06\/download-3.png\" alt=\"\" class=\"wp-image-763\" srcset=\"http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/download-3.png 346w, http:\/\/write.muthu.co\/wp-content\/uploads\/2018\/06\/download-3-300x218.png 300w\" sizes=\"auto, (max-width: 346px) 100vw, 346px\" \/><\/a><\/figure><\/div>\n\n\n\n<p>A Gaussian kernel is about 97% accurate in classification.<\/p>\n\n\n\n<p>Amongst the Gaussian kernel and polynomial kernel, we can see that Gaussian kernel prediction was closest to 100% prediction rate while the polynomial kernel was lesser. Therefore the Gaussian kernel performed slightly better. You have to test all the kernels to identify the one which performs best.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Support Vector machines (SVM) can be used for both classification as well as regression tasks but they are mostly used in classification applications. Some of the real world applications include Face detection, Handwriting detection, Document categorisation, SPAM Filtering, image classification and protein remote homology detection. For many researchers, SVM is the first best choice for [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[24,37,32],"tags":[46,49,58],"class_list":["post-735","post","type-post","status-publish","format-standard","hentry","category-artificial-intelligence","category-data-science","category-python","tag-artificial-intelligence","tag-data-science","tag-python"],"_links":{"self":[{"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/posts\/735","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/comments?post=735"}],"version-history":[{"count":3,"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/posts\/735\/revisions"}],"predecessor-version":[{"id":1839,"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/posts\/735\/revisions\/1839"}],"wp:attachment":[{"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/media?parent=735"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/categories?post=735"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/tags?post=735"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}