Full Instructions is in the attached documents Part I: Scenario: You are a policy analyst working for the Federal Government. You have been tasked with identifying if there is a gap in the future when it comes to the salaries for male and female employees in the industry. In addition, you will evaluate the ethical rules applicable to the IT industry that this gap might violate. You will use logical reasoning to list, step-by-step, the possible unethical results if the gap does exist. You are using the public dataset, available from one of the largest online job boards. You decided to use the multiple linear regression model for prediction. Locate the combined dataset. PT II You will compose a Python script and a one- or two-page report. Please note that if you are using PyCharm, you will need to save all your plots as .png files and submit them among the deliverables; if you are using Jupyter, you don’t need to do that. The requirements are as follows:
Part 1 – Data Tables Using the spreadsheet file that you created in Module 1, identify the food items, for which portion amounts are recorded as greater than ‘1’ in your food log. In a new sheet, create one-variable data tables for these food items. Data tables should calculate the amounts of calories relevant to the number of portions. The input values for the number of portions should run between ‘1’ and the value recorded in your food log. For example, if the number of portions for some food item has been recorded as ‘5’ in your food log, then input values in a data table would be integers from 1 to 5. For each previously selected food item, create a two-variable data table that calculates the amounts of calories based on various portion sizes and the number of portions. As in the previous step, the maximum number of portions should agree with the corresponding value of the item in your food log. Note: Refer to the Food Table sheet from Module 1 for alternative portion sizes for considered food items. Part 2 – Scenarios Using Scenario Manager, create a set of scenarios for projecting the results of calculations in your food log table. Note: Save the initial values as “Original values” scenario. Next, in a new sheet, generate a scenario summary report. Part 3 – Summary Report In a document, using 3-4 sentences, explain the role of one and two-variable data tables in a data analysis process as applied to your model. Briefly describe your approach for building scenarios. Clarify whether you chose to eliminate certain food items from your menu or to reduce the portion sizes, and why. In 3-4 sentences, discuss whether a scenario summary report contains all information necessary for making an adequate decision. Submit both the Excel workbook and a Word document for this activity.
Weight: 12% of course grade Grading Rubric Instructions Your assignment for this unit is to complete the following five activities. Your completed scholarly activity should be at least two pages in length. Conduct a search on the Internet and/or CSU Online Library, and identify the functions and purpose of the following operating systems (OS): Microsoft Windows, Mac OS X, Linux, iOS, Blackberry OS, and Android. Identify the advantages and disadvantages of each. Describe the file management system on your desktop or laptop computer. In addition, either provide a screenshot or create a table of your file structure to include files, folders, and subfolders. Explain the process and programs used to backup both Windows and the Mac OS. Explain the function and importance of each of the following system utilities: disk cleanup utility, disk checking utility, and disk defragmenter utility. Discuss three common computer problems and the actions you would take to troubleshoot the problems. Your scholarly activity must be at least two pages in length, double-spaced. You are required to use at least two outside sources, one of which must come from the CSU Online Library. All sources used, including the textbook, must be referenced; paraphrased and quoted material must have accompanying APA citations. Resources The following resource(s) may help you with this assignment. Citation Guide CSU Online Library Research Guide Submit Writing Center Request
Link to Dropbox with Assignment details: https://www.dropbox.com/scl/fo/48pvkkmcgxxew3z4xmaut/h?dl=0&rlkey=acn7tqgzv1fixd0ydd9sjysxf You should submit a knitted html report that you make by opening the R Markdown template “data-writeup.Rmd” and adding your writeup and R code, then knitting to make the html report. The R Markdown template can be found at: data-writeup.Rmd A list of possibly useful datasets can be found at: GOV 350K Example Data Source.pdf
Lab Assignment 4: Linear Regression GOV S350K Instructor Philip Moniz Due by 11:00pm on 8/5 In this activity, you will explore a dataset on former UT students who started as freshmen in the year 2000. The data can be downloaded from Canvas. There you will find a dataset (in .csv format). You should start by downloading this data file as well as the .Rmd template file to your computer, saving them both in the same folder (ideally one set up for data analysis for this course). You should then start RStudio not by clicking the application icon, but instead by double-clicking the .Rmd template which should open RStudio with the working directory for R automatically set to the location of the .Rmd file (which should also be the same location as the dataset). Below are brief descriptions of the variables we will use in the data: SAT.V score on SAT verbal SAT.Q score on SAT quantitative SAT.C overall (total) SAT score School which school at UT the student graduated from GPA student’s grade point average at graduation Status whether student graduated (“G” indicates student graduated which is the value for every observation in this dataset) Question 1: Loading the Dataset Open RStudio and load the dataset using the read.csv command. You will have to give the dataset a name in R. I’d suggest calling it UT. Then use the head command to print the first several rows in the dataset. Question 2: Descriptive Statistics and Histograms. Calculate the mean and standard deviation of the variables SAT.V and GPA. Then make histograms of both of these variables. Question 3: Examining Variation by School Make a boxplot of SAT.V by School. Because there are many values of School you might not see all the labels on the x axis. Add the argument , cex.axis=.6 to the command right before the closing parenthesis to shrink the labels a bit. You still might not see all of them in RStudio, but they should show up in the knitted document. Briefly comment on which schools had the highest SAT scores. Question 4: Plotting SAT score vs GPA Make a scatterplot of GPA (on the vertical axis) against verbal SAT score (on the horizontal axis) and briefly comment on what you see. Question 5: Predicting GPA with SAT Scores Estimate a linear regression predicting GPA with verbal SAT scores, saving the regression object under a name, and then print a summary. Question 6: Interpreting Regression Coefficients Looking at the regression results from the question above, describe what the coefficient estimate tells you about the relationship between verbal SAT score and GPA is. Is this a large or small relationship? (Keep in mind that you need to consider the scale of both variables here, so look back at the histograms you made above. For example, what does the coefficient estimate for SAT.V tell you is the expected change in GPA when verbal SAT score increases by 100, which is a decent, but not gigantic, change in this independent variable.) Also be sure to comment on the statistical significance of the estimate and what that tells you. Hint: Remember that for very small or very large numbers, R often prints them in scientific notation. For example, 2.083e-03 is 2.083×10-3=0.002083. Question 7: Predictions What would the regression results above predict would be the GPA of someone who came to UT with a verbal SAT score of 800? …what about 700, 600, and 500? (Hint: you should be able to just look at the estimates from the regression above and do the calculations yourself in R without using any special functions, just +, -, *, etc. Also, 1.234e+00 is just 1.234×100=1.234×1=1.234.) Question 8: R-squared Look at the r-squred from the regression above (which R calls Multiple R-squared – you can ignore Adjusted R-squared). What does this tell you about how much of the variation in GPA is explainable by SAT scores?