: Identify an
important business problem, find one or
more relevant datasets, generate insightful
visualizations of the data, fit a range of
models to the data to produce your best
predictions/forecasts, and make and justify
recommendations to a decision maker
related to this problem.(Tableau & R
required)
Section 1: The Problem (10%)
• Discuss the problem you are addressing.
• What are the questions and business/management decisions your analysis is trying to address?
• Describe your problem’s decision maker and what is important for them to know from your data analysis?
• Discuss the source of your data. Questions to consider include:
– Where did you find this data?
– How reliable or uncertain is this data?
– How old is the data?
– Is the data recorded at given dates or times?
• Identify and justify your choice of target attribute(s) and explain how this/these should be derived, if not
already available.
Section 2: Understand the Data (30%)
• Discuss the nature and size of the dataset(s) you are using.
• Discuss the data attributes that are relevant to your problem. Exactly what does the data represent and, if
relevant, how was it derived? How is it distributed? What type of data is it?
• Explore and discuss whether any of the data attributes you have focused on are closely correlated with
other attributes – either positively or negatively.
• Include at least 3 different types of Tableau visualisations (e.g. map, scatter plot, bar chart, pie chart, boxand-whisker plot) to support your discussions.
• Include at least 3 R-generated plots or aggregation tables to support your discussions.
• Include the R-code you used in the code appendix.
Section 3: Prepare the Data (10%)
• If required, explain how you have derived your chosen target attribute(s) in Tableau and in R.
• Discuss and justify what other steps you may have taken to prepare your data, including, where relevant:
removing attributes from consideration, adding further “derived” attributes (eg Dates), imputing
“reasonable” values for missing data,
and standardizing data values.
• Prepare suitable separate “Training”, “Validation” (if required) and “Testing” subsets of the dataset.
• Include any R-code you used to prepare your data in the code appendix.
Section 4: Generate and Test Prediction Models (40%)
• Select and justify at least 3 different prediction model types that are likely to best help with your stated
problem objectives.
• Configure your models (e.g. select attributes and/or other model parameters) that you expect will best
deliver relevant insights and/or provide the lowest error rates, justifying your decisions.
• Run these models, discussing the model outputs and drawing, where possible, insights related to your
problem.
• Prepare and discuss at least 1 ensemble model, combining two or more of your prediction models.
• Select a proper scoring rule to measure the accuracy of your models. Determine and comment on the
best generalised error rate across your 4 prediction models and of your ensemble models.
• Discuss what steps you may have taken to improve your individual models.
• Include any R-code you used to prepare your data in the code appendix.
Section 5: Problem Conclusions and Recommendations (10%)
• Combining the results from your various analysis steps, draw conclusions about the particular problem
and questions stated at the beginning.
• What recommendations would you now make to your problem’s decision maker and why?
• Which are the most important variables for the decision maker to look at? Marking Criteria
Marks will be awarded for:
• Using Tableau and R in a way that is relevant and appropriately justified, and that is ideally different from
that presented in the lectures and other module materials.
• Meaningful insights are discussed after each analysis task.
• Your analysis should flow, with each step building on the last.
• Structuring your report and analysis so as to follow the standard stages of a data science project.
• The correctness and quality of your code, visualisations and conclusions.
• Employing a wide range of the concepts and methods covered in this module.
• Problem identification: you have found a novel and significant problem.
• Proposed a compelling solution/recommendation; you have generated important business or policy
insights.
• Your report was well-written: clear and compelling.
Submission Requirement
You are required to submit 3 files for this assignment:
1. A PDF file containing your fully completed assignment, including an appendix containing all your Rbased analysis.
2. A runnable code file containing all your R-based analysis. This file can either be submitted as a R script
file (.R file) or as an R-based Jupyter Notebook File (.ipynb file).
3. A data file if it is not too large to upload on Moodle. If it is too large, please submit to this part of the
assignment drop box a PDF document providing links to the original datasets.
Only the first PDF file will be marked. The additional code file and data file are only provided to ensure your
code works as you have claimed it should.

Essay Mill

Share
Published by
Essay Mill

Recent Posts

Childbirth

For this short paper activity, you will learn about the three delays model, which explains…

1 month ago

Literature

 This is a short essay that compares a common theme or motif in two works…

1 month ago

Hospital Adult Medical Surgical Collaboration Area

Topic : Hospital adult medical surgical collaboration area a. Current Menu Analysis (5 points/5%) Analyze…

1 month ago

Predictive and Qualitative Analysis Report

As a sales manager, you will use statistical methods to support actionable business decisions for Pastas R Us,…

1 month ago

Business Intelligence

Read the business intelligence articles: Getting to Know the World of Business Intelligence Business intelligence…

1 month ago

Alcohol Abuse

The behaviors of a population can put it at risk for specific health conditions. Studies…

1 month ago