{"id":661,"date":"2018-06-03T15:27:06","date_gmt":"2018-06-03T15:27:06","guid":{"rendered":"http:\/\/muthu.co\/?p=661"},"modified":"2021-05-24T03:47:04","modified_gmt":"2021-05-24T03:47:04","slug":"multiple-linear-regression-with-python-on-framingham-heart-study-data","status":"publish","type":"post","link":"http:\/\/write.muthu.co\/multiple-linear-regression-with-python-on-framingham-heart-study-data\/","title":{"rendered":"Multiple Linear Regression with Python on Framingham Heart Study data"},"content":{"rendered":"\n

Previously we built a simple linear regression model <\/a> using a single explanatory variable to predict the price of pizza from its diameter. But in the real world the price of pizza cannot be entirely derived from the diameter of its base alone. It also depends on the toppings, which means there are a many more independent variables to be used in the prediction equation. A regression using multiple explanatory variables is called multiple linear regression. A simple linear regression uses a single explanatory variable with a single coefficient whereas a multiple linear regression uses a coefficient for each explanatory variables but a single dependant variable.<\/p>\n\n\n\n

The equation of multiple linear regression looks like:<\/p>\n\n\n\n

\"\"<\/a><\/figure><\/div>\n\n\n\n

where x1<\/sub>,x2<\/sub>……xn <\/sub><\/em>represent our explanatory variables and y <\/em>is our dependant variable. Mathematics of multiple linear regression involves complex matrix operations, a more detailed explanation of which I will keep for another post.<\/p>\n\n\n\n

In this post we will focus on building a multiple linear regression modal using the Framingham Heart Study data<\/b><\/a> which is a long-term, ongoing cardiovascular cohort study on residents of the town of Framingham, Massachusetts. The study began in 1948 with 5,209 adult subjects from Framingham, and is now on its third generation of participants. Much of the now-common knowledge concerning heart disease, such as the effects of diet, exercise, and common medications such as aspirin, is based on this study. One of the important observations from Framingham heart study was the strong correlation between BMI and blood pressure. People with higher BMI are at a higher risk of cardiovascular disease. Blood Pressure is also associated with rising age. Lets see if can derive these observations using our regression modal.<\/p>\n\n\n\n

First, lets download the Framingham data from here<\/a>. This dataset has a long list of columns from which we will be picking SYSBP, BMI, AGE, SEX <\/em><\/strong>and BPMEDS<\/em><\/strong> to build our modal.<\/p>\n\n\n\n

SYSBP<\/strong><\/em> – Systolic blood pressure is what we are trying to predict so it becomes our dependant variable Y.<\/em> The below table shows the different categories of BP with their standard values.<\/p>\n\n\n\n

\"SYS<\/a>
Source: www.heart.org<\/figcaption><\/figure><\/div>\n\n\n\n

BMI, AGE, SEX<\/strong> and BPMEDS<\/strong><\/em> (an indicator for antihypertensive medication use), all four variables will be our explanatory variable X. The standard values of BMI are as below:<\/em><\/p>\n\n\n\n