Categories
R

Week 5 Lab Assignment¶ Tree-based Models Tree-based model is useful for classifi

Week 5 Lab Assignment¶
Tree-based Models
Tree-based model is useful for classification and regression problems where one has a set of predictor variables X and a single reponse Y.
Why we use tree-based models?
They are easier to interpret and discuss than linear models. Also, we do not have to worry about missing values or variable transformations. In tree-based models, we do not have to worry about recoding categorical variables.
In this RLab assignment, we will use rpart package in R. You can go over chapter 09 of HOML book to go over the details.

Categories
R

Week 5 Lab Assignment¶ Tree-based Models Tree-based model is useful for classifi

Week 5 Lab Assignment¶
Tree-based Models
Tree-based model is useful for classification and regression problems where one has a set of predictor variables X and a single reponse Y.
Why we use tree-based models?
They are easier to interpret and discuss than linear models. Also, we do not have to worry about missing values or variable transformations. In tree-based models, we do not have to worry about recoding categorical variables.
In this RLab assignment, we will use rpart package in R. You can go over chapter 09 of HOML book to go over the details.

Categories
R

# Exercise #1: List the Nine_neighbors # your code here Nine_balance<-arrange(na

# Exercise #1: List the Nine_neighbors
# your code here
Nine_balance<-arrange(names, income)[
Exercise 2
Use knn function in class package and predict labels in the test data with knn when k=5. Use
set.seed(4230) and name the p # Exercise #3: knn results when k=10
# your code here #
Exercise #4: Performance measure
# your code here 0
Exercise 5:
Write a function to find the optimal k (the k value which minimizes the classification error) and call
it optimal_k. In other words, at which value of k does the k_class_error take the minimum
value?

Categories
R

this project has to be done in RMarkdown The dataset attached, rubric and projec

this project has to be done in RMarkdown
The dataset attached, rubric and project information is attached
please follow project rubric.
Dataset found on Kaggle: Swiss banknote counterfeit detection (https://www.kaggle.com/datasets/chrizzles/swiss-banknote-conterfeit-detection)
please answer only part a, c
a)PCA analysis group includes: This group will use PCA to reduce the dimension of data. At the same time look for potential correlations between different predictors.
c)Hypothesis testing group 2 includes: This group will try to use appropriate statistical techniques to infer the population means of each predictor variable for counterfeit and real bills.

Categories
R

The problem set has 26 multiple choice questions, the file is attached down. Ple

The problem set has 26 multiple choice questions, the file is attached down. Please put the answers all in one MS word file. Some of the questions will require to use R. Please submit a text file with all the R commands that you will use to anwer some of the questions. After you take this post, I will send you the data that you will need to use through email because I am not able to upload it here! It does not work.
This is what the questions about:
1. Measurement error and missing values.
2. Proxy Variables.
3. Computer Problem on Heteroskedasticity.
And other stuff.
Please check the file for all other stuff and details.
Please take this question only and ONLY if you are good at R and Econometrics.
Two files will be submitted : 1. MS word file with all the answers (26). 2. a text file for all R commands used.
let me know if you have any questions. Thanks!

Categories
R

The problem set has 26 multiple choice questions, the file is attached down. Ple

The problem set has 26 multiple choice questions, the file is attached down. Please put the answers all in one MS word file. Some of the questions will require to use R. Please submit a text file with all the R commands that you will use to anwer some of the questions. After you take this post, I will send you the data that you will need to use through email because I am not able to upload it here! It does not work.
This is what the questions about:
1. Measurement error and missing values.
2. Proxy Variables.
3. Computer Problem on Heteroskedasticity.
And other stuff.
Please check the file for all other stuff and details.
Please take this question only and ONLY if you are good at R and Econometrics.
Two files will be submitted : 1. MS word file with all the answers (26). 2. a text file for all R commands used.
let me know if you have any questions. Thanks!

Categories
R

this project has to be done in RMarkdown The dataset attached, rubric and projec

this project has to be done in RMarkdown
The dataset attached, rubric and project information is attached
please follow project rubric.
Dataset found on Kaggle: Swiss banknote counterfeit detection (https://www.kaggle.com/datasets/chrizzles/swiss-banknote-conterfeit-detection)
please answer only part a, c
a)PCA analysis group includes: This group will use PCA to reduce the dimension of data. At the same time look for potential correlations between different predictors.
c)Hypothesis testing group 2 includes: This group will try to use appropriate statistical techniques to infer the population means of each predictor variable for counterfeit and real bills.

Categories
R

Are there any other questions that you would have tackled if you had had more time?

This is the Group project. My part is to do the result And the Conclusion part. The topic is about the Mental effect of Covid in students. the following is the rubric for Conclusion and result.
Conclusions and interesting future research questions.Summarize your results and state the conclusions of your analysis.
Are there any new questions that are open with your analysis? Are there any other questions that you would have tackled if you had had more time?
Note: I have also attached the picture of the sample how conclusion should like and the picture of rubric, please read the rubric and sample picture

Categories
R

Explain what this tells you about your model.

Assignment #3 – Intro to modeling
Please note that these assignments are meant to build on concepts from before so please keep that in mind when solving the problems set forth.
Your submission for this will be one file that has answers to all of the parts below in HTML format.
Part 1 – Get set up for the exam.
Navigate to Module 1 – Resources, download all exam files, and ensure that everything works prior to Exam 1 (you’ll find this all on the study guide). There isn’t anything to submit for part 1.
Part 2 – Regression and Predictions
Before getting started, please be sure to download the “vehicles Download vehiclesDownload vehicles” data-set.
1. Partition your data so that 60% is training, and 40% is testing data. Create a data-frame called “training_vehicles” (this will have 60% of your original vehicles data) and another called “testing_vehicles” (this will have 40% of your original vehicles data).
2. Create a linear regression model to predict the MPG of a vehicle. Explain if your model is statistically significant and tell me the proportion of variation that is accounted for in the linear model. You can use simple linear regression or multiple linear regression (the choice is yours).
3. Create a visual that shows the residuals of your linear model. Explain what this tells you about your model.
Part 3 – Regression Continued (Putting it all together)
Documentation for factors: https://r4ds.had.co.nz/factors.html (and we’ve covered it in class).Links to an external site.
Documentation for how to use the recode() function to accomplish the conversions below (this is an assessment regarding reading documentation pages as I’ve shown in class): https://dplyr.tidyverse.org/reference/recode.html Links to an external site..
Data for questions: dcbikeshare-1.csv Download dcbikeshare-1.csv
Metadata for questions: https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+DatasetLinks to an external site.
Hint: mutate(season = factor(season, levels = c(“spring”, “summer”, “fall”, “winter”))) ##This is is how you set “spring” as the baseline after you recode your data. The baseline is the first position – in this case spring is at position 1, summer is at position 2, fall is at position 3, winter at position 4.
Question 1: Convert the season variable to be a factor with meaningful names per the metadata. Set spring as the baseline level (the first level).
Question 2: Convert the binary variables holiday and working day to be factors with levels no (0) and yes (1), with no as the baseline level (the first level).
Question 3: Convert the yr variable to be a factor with levels 2011 and 2012, with 2011 as the baseline level (the first level).
Question 4: Convert the weathersit variable as 1 – clear, 2 – mist, 3 – light precipitation, 4- heavy precipitation. Set clear as the baseline. Note: in the data-set, there are no instances of “4”, but you can still write the code for it.
Question 6: Fit a linear model predicting the total bike rentals from daily temperature. Explain the performance of this model in 3-4 sentences.
Question 7: Fit a linear model predicting total daily bike rentals from season, year, whether the day is holiday or not, whether the day is a working day or not, the weather category, temperature, feeling temperature, humidity, and windspeed. What is the adjusted R-squared value of this model?
Note: Everyone is to submit this individually even if you did or did not work with your group. If you worked with your group, I expect your submission to be the same as your group members.

Categories
R

Explain what this tells you about your model.

Assignment #3 – Intro to modeling
Please note that these assignments are meant to build on concepts from before so please keep that in mind when solving the problems set forth.
Your submission for this will be one file that has answers to all of the parts below in HTML format.
Part 1 – Get set up for the exam.
Navigate to Module 1 – Resources, download all exam files, and ensure that everything works prior to Exam 1 (you’ll find this all on the study guide). There isn’t anything to submit for part 1.
Part 2 – Regression and Predictions
Before getting started, please be sure to download the “vehicles Download vehiclesDownload vehicles” data-set. 1. Partition your data so that 60% is training, and 40% is testing data. Create a data-frame called “training_vehicles” (this will have 60% of your original vehicles data) and another called “testing_vehicles” (this will have 40% of your original vehicles data).
2. Create a linear regression model to predict the MPG of a vehicle. Explain if your model is statistically significant and tell me the proportion of variation that is accounted for in the linear model. You can use simple linear regression or multiple linear regression (the choice is yours).
3. Create a visual that shows the residuals of your linear model. Explain what this tells you about your model.
Part 3 – Regression Continued (Putting it all together)
Documentation for factors: https://r4ds.had.co.nz/factors.html (and we’ve covered it in class).Links to an external site.
Documentation for how to use the recode() function to accomplish the conversions below (this is an assessment regarding reading documentation pages as I’ve shown in class): https://dplyr.tidyverse.org/reference/recode.html Links to an external site..
Data for questions: dcbikeshare-1.csv Download dcbikeshare-1.csv Metadata for questions: https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+DatasetLinks to an external site.
Hint: mutate(season = factor(season, levels = c(“spring”, “summer”, “fall”, “winter”))) ##This is is how you set “spring” as the baseline after you recode your data. The baseline is the first position – in this case spring is at position 1, summer is at position 2, fall is at position 3, winter at position 4.
Question 1: Convert the season variable to be a factor with meaningful names per the metadata. Set spring as the baseline level (the first level).
Question 2: Convert the binary variables holiday and working day to be factors with levels no (0) and yes (1), with no as the baseline level (the first level).
Question 3: Convert the yr variable to be a factor with levels 2011 and 2012, with 2011 as the baseline level (the first level).
Question 4: Convert the weathersit variable as 1 – clear, 2 – mist, 3 – light precipitation, 4- heavy precipitation. Set clear as the baseline. Note: in the data-set, there are no instances of “4”, but you can still write the code for it.
Question 6: Fit a linear model predicting the total bike rentals from daily temperature. Explain the performance of this model in 3-4 sentences.
Question 7: Fit a linear model predicting total daily bike rentals from season, year, whether the day is holiday or not, whether the day is a working day or not, the weather category, temperature, feeling temperature, humidity, and windspeed. What is the adjusted R-squared value of this model?
Note: Everyone is to submit this individually even if you did or did not work with your group. If you worked with your group, I expect your submission to be the same as your group members.