Swamped with your writing assignments? Take the weight off your shoulder!
RMZ template file will not upload on site. Please message if needed for assignment and I can send over. Lab Assignment 4: Linear Regression GOV S350K Instructor Philip Moniz Due by 11:00pm on 8/5 In this activity, you will explore a dataset on former UT students who started as freshmen in the year 2000. The data can be downloaded from Canvas. There you will find a dataset (in .csv format). You should start by downloading this data file as well as the .Rmd template file to your computer, saving them both in the same folder (ideally one set up for data analysis for this course). You should then start RStudio not by clicking the application icon, but instead by double-clicking the .Rmd template which should open RStudio with the working directory for R automatically set to the location of the .Rmd file (which should also be the same location as the dataset). Below are brief descriptions of the variables we will use in the data: SAT.V score on SAT verbal SAT.Q score on SAT quantitative SAT.C overall (total) SAT score School which school at UT the student graduated from GPA student’s grade point average at graduation Status whether student graduated (“G” indicates student graduated which is the value for every observation in this dataset) Question 1: Loading the Dataset Open RStudio and load the dataset using the read.csv command. You will have to give the dataset a name in R. I’d suggest calling it UT. Then use the head command to print the first several rows in the dataset. Question 2: Descriptive Statistics and Histograms. Calculate the mean and standard deviation of the variables SAT.V and GPA. Then make histograms of both of these variables. Question 3: Examining Variation by School Make a boxplot of SAT.V by School. Because there are many values of School you might not see all the labels on the x axis. Add the argument , cex.axis=.6 to the command right before the closing parenthesis to shrink the labels a bit. You still might not see all of them in RStudio, but they should show up in the knitted document. Briefly comment on which schools had the highest SAT scores. Question 4: Plotting SAT score vs GPA Make a scatterplot of GPA (on the vertical axis) against verbal SAT score (on the horizontal axis) and briefly comment on what you see. Question 5: Predicting GPA with SAT Scores Estimate a linear regression predicting GPA with verbal SAT scores, saving the regression object under a name, and then print a summary. Question 6: Interpreting Regression Coefficients Looking at the regression results from the question above, describe what the coefficient estimate tells you about the relationship between verbal SAT score and GPA is. Is this a large or small relationship? (Keep in mind that you need to consider the scale of both variables here, so look back at the histograms you made above. For example, what does the coefficient estimate for SAT.V tell you is the expected change in GPA when verbal SAT score increases by 100, which is a decent, but not gigantic, change in this independent variable.) Also be sure to comment on the statistical significance of the estimate and what that tells you. Hint: Remember that for very small or very large numbers, R often prints them in scientific notation. For example, 2.083e-03 is 2.083×10-3=0.002083. Question 7: Predictions What would the regression results above predict would be the GPA of someone who came to UT with a verbal SAT score of 800? …what about 700, 600, and 500? (Hint: you should be able to just look at the estimates from the regression above and do the calculations yourself in R without using any special functions, just +, -, *, etc. Also, 1.234e+00 is just 1.234×100=1.234×1=1.234.) Question 8: R-squared Look at the r-squred from the regression above (which R calls Multiple R-squared – you can ignore Adjusted R-squared). What does this tell you about how much of the variation in GPA is explainable by SAT scores?