The following learning outcomes will be assessed:
1. Critically select and apply key machine learning and statistical techniques for data analytics
projects across the whole data science lifecycle on modern data science platforms and with
data science programming languages.
2. Appropriately characterize the types of data; to perform the pre-processing, transformation,
fusion, analysis of a wide range type of data; and to visualize and report the results of the
analysis of various types of data.
The first part of the report is simply text describing the introduction, application area and data to be
used, machine learning algorithms to be used.
What I expect to see for the practical implementation part of the report are screenshots of your code
in the RStudio script editor. Screenshots of key outputs and screenshots of important diagrams. Along
with text to describe what I’m seeing and identify any salient points. The presentation of your practical
work should be identical to the way I’ve presented the Activities in R over the last seven weeks. You
need to use snipping tool in Windows or similar to grab screenshots of selected areas.
Finally, write up your work in a 1,500 word (+/- 10%) report
The report should include the following headings:
Report – (40 marks)
Introduction (10 marks)
Application area and data (10 marks)
Machine learning algorithms (10 marks)
Conclusion, structure of report, including refs (10 marks)
Practical Implementation – (60 marks)
Pre-processing on real or simulated data (10 marks)
R Programming content and your function (20 marks)
Display of data/results (20 marks)
Source code listing (10 marks)
State the R packages you have used, any source code you have used from others. Also, place a full R
source listing at back of report – it will not add to word count but DO NOT go over page count of 15
You can refer to any of your course handouts, any other books, journals, online resources etc.
Your introduction should include a summary of the main points that you will discuss in your report.
Your report should outline the area your data is from and what you hope to achieve. Your
introduction should be about 150 words in length.
2. Data used
The purpose of this section is to ensure you understand the types of data and the pre-processing
you will use. What types of variables are present such as: integer, dates, strings, etc. Provide
literature and examples associated with your data set. This section should be approximately 150
3. Machine learning methods used
In this section you should identify the machine learning methods that you will apply to the UCI
data. What criteria will be used to measure the success of the machine learning methods. This
section should be approximately 150 words.
4. Practical: Pre-processing of data
In this section you should discuss how the data was read in, what pre-processing if any occurred
and why you did it. Show me screen shots of code with your text write up. This section should be
no more than 150 words in length.
5. Practical: R Programming content
In this section you should show me screen shots of code with your text write up. The R
programming content can include building your machine learning models, testing of models,
perhaps you have done a compare/contrast with several models. I would also like to see an R
function written by you. The source code should be neat and tidy, use comments where necessary
to explain the main actions of your code. This section should be no more than 300 words in length.
6. Practical: Display of data/results
This section you should use screenshots of key R output, important diagrams and anything to do
with your machine learning models. Along with text descriptions of the outputs. It should be no
more than 300 words in length.
7. Source code listing
This includes all your R code including the library commands. I expect to be able to load in the
libraries you have used and copy and paste and run your analysis.
In this section you should summarise your experimental results and findings. This section should
be approximately 150 words.
9. References and look and feel of report
These should be to Harvard standards (not included in work count but should be between 5-10
references). References should be valid and appropriate. The formatting of the report should be
neat and tidy. Diagrams should be used with good descriptive text. Diagrams should be easy to
read, and a sensible number of no more than 6-7 diagrams used. No more than 15 pages in total
for everything including source code listings, put source code listing in font size 10.
The word counts for the sections are just advisory based on marks allocated.
Your report should be spell checked and contain references. You must use the Harvard style of
referencing, both for citations within the text and your reference list.