here<\/a>.<\/p>\n\n\n\nimport pandas as pd\n\n# Importing the dataset\ndataset = pd.read_csv('fram1.csv')\n#ignoring a few null values from the dataset.\ndataset = dataset[~dataset['BMI'].isnull()] \nX = dataset.iloc[:, [0,8,3,6]].values #select SEX,BMI,AGE,CURSMOKE #,3,6\ny = dataset.iloc[:, 4].values #select SYSBP<\/code><\/pre>\n\n\n\nUpon trying to build the modal I found that few rows have the BMI empty, so I modified the program to remove null rows as you can see above.<\/p>\n\n\n\n
2. Encoding the categorical data – <\/strong>The values in the field of SEX indicating male and female in the dataset is 1 and 2. We need to encode this using a label encoder which will give use 0 and 1 for Male and Female.<\/p>\n\n\n\n# Encoding categorical data\nfrom sklearn.preprocessing import LabelEncoder\n\nlabelencoder = LabelEncoder()\nX[:, 0] = labelencoder.fit_transform(X[:, 0])<\/code><\/pre>\n\n\n\n3. Splitting the dataset into training and test set<\/strong> – This is an important aspect to model building which helps us in assessing our model’s predictive capabilities. The evaluation of which is done using the R-squared method. R-squared measures how well the observed values of the response variables are predicted by the model. In python we can simply use the score<\/em><\/strong> method to calculate the R-square. An r-squared score of 1 indicates that the response variable can be predicted without any error using the model. An r-squared score of one half indicates that half of the variance in the response variable can be predicted using the model.<\/p>\n\n\n\n# Splitting the dataset into the Training set and Test set\nfrom sklearn.model_selection import train_test_split\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)\n<\/code><\/pre>\n\n\n\n4. Fitting the Multiple regression to training set<\/strong><\/p>\n\n\n\n# Fitting Multiple Linear Regression to the Training set\nfrom sklearn.linear_model import LinearRegression\nregressor = LinearRegression()\nregressor.fit(X_train, y_train)\n\n# Predicting the Test set results\ny_pred = regressor.predict(X)\n\nprint 'R-squared: %.2f' % regressor.score(X_test, y_test)\nprint regressor.coef_<\/code><\/pre>\n\n\n\nOutputs:<\/p>\n\n\n\n
R-squared: 0.26<\/li> Regression Coef: 2.51226344 , 1.55951204, 0.92496226, -0.16286499<\/li><\/ul>\n\n\n\n5. Plotting the output on a scatter chart.<\/strong><\/p>\n\n\n\nfig, ax = plt.subplots()\nax.set_xticks([18.5, 24.9, 29.9], minor=False) #important values of BMI\nax.set_yticks([120, 130, 140, 180], minor=False) #important values of SysBP\nax.xaxis.grid(True, which='major',linewidth='0.5', color='red')\nax.yaxis.grid(True, which='major',linewidth='0.5', color='blue')\n\nplt.scatter(X[:,1], y_pred, marker='.')\nplt.ylabel(\"Systolic blood pressure\")\nplt.xlabel(\"BMI\")\nplt.show()<\/code><\/pre>\n\n\n\nIn the above plot we added appropriate grid lines to represent the various categories of BMI and SYS BP.<\/p>\n\n\n\n
The entire program is as below:<\/p>\n\n\n\n
# Multiple Linear Regression\n# Importing the libraries\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport pandas as pd\n\n# Importing the dataset\ndataset = pd.read_csv('fram1.csv')\ndataset = dataset[~dataset['BMI'].isnull()] #ignoring a few null values from the dataset.\nX = dataset.iloc[:, [0,8,3,6]].values #select SEX,BMI,AGE,CURSMOKE #,3,6\ny = dataset.iloc[:, 4].values #select SYSBP\n\n# Encoding categorical data\nfrom sklearn.preprocessing import LabelEncoder, OneHotEncoder\nlabelencoder = LabelEncoder()\nX[:, 0] = labelencoder.fit_transform(X[:, 0])\n\n# Splitting the dataset into the Training set and Test set\nfrom sklearn.model_selection import train_test_split\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)\n\n# Fitting Multiple Linear Regression to the Training set\nfrom sklearn.linear_model import LinearRegression\nregressor = LinearRegression()\nregressor.fit(X_train, y_train)\n\n# Predicting the Test set results\ny_pred = regressor.predict(X)\n\nprint 'R-squared: %.2f' % regressor.score(X_test, y_test)\nprint regressor.coef_\n\n\nfig, ax = plt.subplots()\nax.set_yticks([18.5, 24.9, 29.9], minor=False) #important values of BMI\nax.set_xticks([120, 130, 140, 180], minor=False) #important values of SysBP\nax.yaxis.grid(True, which='major',linewidth='0.5', color='red')\nax.xaxis.grid(True, which='major',linewidth='0.5', color='blue')\n\nplt.scatter(y_pred, X[:,1], marker='.')\nplt.xlabel(\"Systolic blood pressure\")\nplt.ylabel(\"BMI\")\n\nplt.show()<\/code><\/pre>\n\n\n\nWe get the below two plots from our regression modals. We can see in the below plots how Age and BMI has an effect on the BMI.<\/p>\n\n\n\n
<\/a>Growth of SYS Blood Pressure with respect to the BMI. The blood pressure is high for category of people with higher BMI (over weight or obese)<\/figcaption><\/figure><\/div>\n\n\n\n <\/a>Growth of SYS BP with respect to a person’s Age.The blood pressure is higher in adults which are older than 50.<\/figcaption><\/figure><\/div>\n","protected":false},"excerpt":{"rendered":"Previously we built a simple linear regression model using a single explanatory variable to predict the price of pizza from its diameter. But in the real world the price of pizza cannot be entirely derived from the diameter of its base alone. It also depends on the toppings, which means there are a many more […]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[37,32],"tags":[49,58],"_links":{"self":[{"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/posts\/661"}],"collection":[{"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/comments?post=661"}],"version-history":[{"count":3,"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/posts\/661\/revisions"}],"predecessor-version":[{"id":1896,"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/posts\/661\/revisions\/1896"}],"wp:attachment":[{"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/media?parent=661"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/categories?post=661"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/write.muthu.co\/wp-json\/wp\/v2\/tags?post=661"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}