Correlation and regression

Part 1. Title and Introduction

Create a nice title for your report.

Write a very informative introduction.

(a) Start with a little bit of history about correlation and linear regression.

(b) Using your own words, inform your audience what do you understand about correlation and determination

coefficients.

(c) Explain the importance of simple linear regression analysis in an industry of your choice. Clearly identify the

industry and use real/practical examples.

*Note: Remember to use at least 2 different academic references for each topic.

For each one of the topics, write a substantial well-informed paragraph.

Part 2. Analysis section

Task 1, describe mpg data, part of ggplot2

Using your own words, describe the data set.

Use and present the outcome of code summarytools::descr().

Use and present the outcome of code psych::describe ().

Make sure that the data presentation is good, specially with describe(), in which case you might want to

transpose (change orientation) of the table by adding code t().

Observe and analyze the two tables. Also, use ?mpg and your R Studio console and find additional sources in

internet. Then, using your own words, provide information about the data set.

Task 2

Go to the scatter plots section of my website (https://rpubs.com/Dee_Chiluiza/scatterplot) and observe

section 4.1.

Following the indications in that section, replicate the table in your report, this time to present the statistics of

displacement per cylinders in the mpg data set.

Make substantial observations of the data, including a clear explanation of what is displacement.

2 | P a g e

Task 3, Correlation determination

a) Present the coefficients of correlation between displacement and cylinders.

b) Present the coefficients of determination between displacement and cylinders.

c) Explain the meaning of their values.

You must use the actual formula presented in the book, translate it into R codes as you already know.

Start this task preparing all your codes in one single R chunk and presenting only the requested

information.

Present the table you prepared to calculate correlation (as we did in class), including a few first

observations of your table (do not present the whole table).

Task 4, learn DescTools library.

In a separate R Markdown file or from your console, perform the following codes, do not present their

outcomes, this is for your learning:

Install package “DescTools”, then activate library(DescTools).

Read about this library and its utility in data analysis.

Run code 1 below, then code 2, compare them (you’re still working on your R Studio Console, not on R

Markdown.

Code 1: DescTools::Desc(mpg)

Code 2: DescTools::Desc(mpg$manufacturer)

Code 3: Similar to code 2, change the variable to model, displ, year, etc.

Clearly observe and analyze the outcomes, what information are you obtaining?

For this task (present what is requested below).

a) Using your own words, describe the DescTools library and its utilities for your work. Use at least 2

references.

b) Present their outcomes of displacement and cylinders using the Desc() code on them.

c) Describe the information presented for both variables, make meaningful observations.

3 | P a g e

Task 5, linear regression

a) Prepare all codes in one single R Chunk. This task is basically similar to what we did in class.

b) Obtain the linear regression formula between cylinders (dependent) and displacement (independent)

using the corresponding R code (create an object to store the data).

c) Create an object to store the summary of the linear regression.

d) Create two objects to extract the intercept and the slope from the summary of the linear regression.

e) Using inline R Codes, present the linear regression formula.

Task 6, scatter plot

a) Create a scatter plot to study the relationship between displacement (dependent) and cylinders

(independent).

1. Change the data points shape using pch code (check values using ?pch on the console).

2. Add the regression line using color code “#99004C”, lty=1 and lwd=1

3. Add horizontal lines to present the mean and median of the dependent variable. Using different

colors for each line.

b) Write some observations of the figure results.

Task 7

Using the linear regression formula, present a table with the predicted values for displacement and the

residuals.

Present only the first 20 observations (Do not present the whole table).

Task 8 Frequency and percentage table

a) Prepare a table to present the frequency of cars based on cylinders.

b) Add new calculated fields to present cumulative frequencies, percentages and cumulative percentages.

c) Present the table using the knitr::kable() code and one KableExtra code.

d) Remember to make observations of your data results.

Task 9

Using your imagination, present several graphs to display the information presented in table from task 8.

Explain your reasons to choose the graphs and make observations of the results from tasks 8 and 9.

Task 10

Using the information prepared above, make some predictions for displacement if a car has 2 and 10 cylinders.

Part 3. Conclusions, bibliography.

Write a very informative conclusions section following indications and recommendations giving by your

instructor,

Be mindful to make an overall observation of the whole project, the meaning of the results you

obtained regarding the direction of the project, explain any new skills you gained.

4 | P a g e

Present a bibliography section with the references you used on your report.

Technically speaking, if you do not mention any references in the main text of your report, then it is like

you did not use any, even if you add a list at the end. Present references in the main text of your

reports, use either only the first author’s last name and year, e.g., (Bluman, 2017) and then list them in

the bibliography section in alphabetical order, or use a number in order of use or appearance, then list

them in the bibliography section in that numerical order.

Part 4. Appendix

Present an appendix title to mention the Rmd document you are attaching to your report.

Format & Guidelines

For this week assignment you must submit 2 files:

Submit your HTML report containing all your findings along with important statistical issues.

Submit the original Rmd file you used to produce your report.

Important: Your report must be well organized.

R Chunks: All R codes must be presented in the HTML report.

Turnitin: Your report will be reviewed using Turnitin. Make sure your score is below 25%.

Please remember: your report is very important, make it look

professional, make it as short as possible but containing all the relevant

information, tell me what you learnt, and using deep critical thinking,

provide examples of practical applications.