{"id":630,"date":"2018-05-23T07:22:41","date_gmt":"2018-05-23T07:22:41","guid":{"rendered":"http:\/\/muthu.co\/?p=630"},"modified":"2021-05-24T03:48:20","modified_gmt":"2021-05-24T03:48:20","slug":"math-behind-linear-regression-and-python-code","status":"publish","type":"post","link":"http:\/\/write.muthu.co\/math-behind-linear-regression-and-python-code\/","title":{"rendered":"Math behind Linear Regression with Python code"},"content":{"rendered":"\n

Simple linear regression<\/strong> is a statistical method you can use to study relationships between two continuous (quantitative) variables:<\/p>\n\n\n\n

  1. independent variable (x) – also referred to as predictor or explanatory variable<\/li>
  2. dependant variable (y) – also referred to as response or outcome<\/li><\/ol>\n\n\n\n

    The goal of any regression model is to predict the value of y (dependant variable) based on the value of x (independent variable). In case of linear regression we would be using past relationships between x and y (which we call our training data) to find a linear equation of type Y = B + Ax and then use this equation to make predictions.<\/p>\n\n\n\n

    Lets get started with a simple example and build our linear regression model with it. Suppose you visit a pizza restaurant whose menu looks somewhat like this:<\/p>\n\n\n\n

    \"\"<\/a><\/figure><\/div>\n\n\n\n

    You would like to know the approximate price of your pizza if it were 12 inches in diameter. Its not there in the menu but with a linear regression model you might be able to find it. The above table is called our training set, <\/strong>because this is what we would use to model the relationship between diameter and the price of pizza. The training set when plotted on a graph looks like this:<\/p>\n\n\n\n

    \"\"<\/a><\/figure><\/div>\n\n\n\n

    Note that the observed (x<\/em>, y<\/em>) data points fall somewhat in a line but not exactly straight. We will be using these data points to find an equation of a straight line which passes through all the observations which we call a linear relationship between diameter and price.<\/p>\n\n\n\n

    What is a Linear Regression Equation?<\/h4>\n\n\n\n
    \n
    \n

    A linear regression equation takes the same form as the equation of a line and is often written in the following general form: y = A + Bx<\/strong><\/em><\/p>\n<\/div>\n<\/div>\n\n\n\n

    \n
    \n

    Where \u2018x\u2019 is the independent variable (the diameter) and \u2018y\u2019 is the dependent variable (the predicted price). The letters \u2018A\u2019 and \u2018B\u2019 represent constants that describe the y-axis intercept and the slope of the line. To find the equation of line, we would need to use the below formula to get A and B.<\/p>\n

    \"\"<\/a><\/figure>

    <\/p>\n

    Lets do the math in an excel sheet for now and find the values of A and B.<\/p>\n<\/div>\n<\/div>\n\n\n\n

    \"\"<\/a><\/figure><\/div>\n\n\n\n

    Our line equation becomes Y = 1.965517241 + 0.9762931034X. <\/strong>So, the approximate price of pizza when the diameter is 12 inches<\/strong> is around $13.68 <\/strong>(rounded off)<\/p>\n\n\n\n

    The plot of the above equation on our previous graph now looks something like this:<\/p>\n\n\n\n

    \"\"<\/a><\/figure><\/div>\n\n\n\n

    The blue line represents our linear regression model line and the dots the training data set.<\/p>\n\n\n\n

    Using training data to learn the values of the parameters for simple linear regression that produce the best fitting model is called ordinary least squares or linear least squares.<\/strong> We will discuss more about evaluating the fitness of a model with cost functions in our next article.<\/p>\n\n\n\n

    Python program showing the actual mathematics of Linear Regression:<\/p>\n\n\n\n

    import numpy as np\n\n#Training Data set\nx = [6, 8, 10, 14, 18]\ny = [7, 9, 13, 17.5, 18]\n\n#find sum(x) , sum(y) , sum (xy), sum(x*x) sum(y*y)\nsum_x = np.sum(x)\nsum_y = np.sum(y)\nsum_xy = np.sum(np.multiply(x,y))\nsum_xx = np.sum(np.multiply(x,x))\nsum_yy = np.sum(np.multiply(y,y))\n\n#print sum_x, sum_y,sum_xx, sum_xy, sum_yy\n\n# #find the intercept values in the slope equation (Y = A + BX)\nnumber_of_records = len(x)\nA = (sum_y*sum_xx - sum_x*sum_xy)\/(number_of_records*sum_xx - sum_x*sum_x)\nB = (number_of_records*sum_xy - sum_x*sum_y)\/(number_of_records*sum_xx - sum_x*sum_x)\n\n#print  A, B\n\n# find the price of pizaa (predict it)\n# x = 12 inch\n# y = A + BX\n\ndiameter = 12\nprice = A + B*diameter\n\nprint price #prints 13.6810344818\n<\/code><\/pre>\n\n\n\n

    Python Program using scikit machine learning library:<\/p>\n\n\n\n

    from sklearn.linear_model import LinearRegression\n# Training data\nX = [[6], [8], [10], [14],   [18]]\ny = [[7], [9], [13], [17.5], [18]]\n\n# Create and fit the model\nmodel = LinearRegression()\nmodel.fit(X, y)\n\nprint 'A 12\" pizza should cost: $%.2f' % model.predict(12)[0]\n#A 12\" pizza should cost: $13.68<\/code><\/pre>\n\n\n\n

    Some examples of real world use of Linear Regression :<\/p>\n\n\n\n

    1. The number of calories burnt versus the number of miles you run.<\/li>
    2. The amount of electricity that is needed to run a house based on the size of the house.<\/li>
    3. Height and weight relationship, As height increases, you’d expect the weight to increase, but not perfectly.<\/li>
    4. Amount of alcohol consumed and the amount of alcohol in your bloodstream.<\/li><\/ol>\n","protected":false},"excerpt":{"rendered":"

      Simple linear regression is a statistical method you can use to study relationships between two continuous (quantitative) variables: independent variable (x) – also referred to as predictor or explanatory variable dependant variable (y) – also referred to as response or outcome The goal of any regression model is to predict the value of y (dependant variable) based on the […]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[24,6,32,31],"tags":[46,54,58,62],"_links":{"self":[{"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/posts\/630"}],"collection":[{"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/comments?post=630"}],"version-history":[{"count":2,"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/posts\/630\/revisions"}],"predecessor-version":[{"id":1898,"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/posts\/630\/revisions\/1898"}],"wp:attachment":[{"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/media?parent=630"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/categories?post=630"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/tags?post=630"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}