The following learning outcomes will be assessed:
1. Critically select and apply key machine learning and statistical techniques for data analytics projects across the whole data science lifecycle on modern data science platforms and with data science programming languages.
2. Appropriately characterize the types of data; to perform the pre-processing, transformation, fusion, analysis of a wide range type of data; and to visualize and report the results of the analysis of various types of data.
You are required to submit your work within the bounds of the University Infringement of Assessment Regulations (see Programme Guide). Plagiarism, paraphrasing and downloading large amounts of information from external sources, will not be tolerated and will be dealt with severely. Although you should make full use of any source material, which would normally be an occasional sentence and/or paragraph (referenced) followed by your own critical analysis/evaluation. You will receive no marks for work that is not your own. Your work may be subject to checks for originality which can include use of an electronic plagiarism detection service.
Where you are asked to submit an individual piece of work, the work must be entirely your own. The safety of your assessments is your responsibility. You must not permit another student access to your work.
Where referencing is required, unless otherwise stated, the Harvard referencing system must be used (see your Programme Guide).
Please ensure that you retain a duplicate of your assignment. We are required to send samples of student work to the external examiners for moderation purposes. It will also safeguard in the unlikely event of your work going astray.
Submission Date and Time As advised on Canvas
Submission Location Via Canvas
THIS ASSIGNMENT REQUIRES R CODING AND A SHORT REPORT (65% of module marks)
Your task is to conduct data analysis on a given data set from the UCI site. To help you in this task please look over our past RStudio activities where we loaded in data, pre-processed it, trained machine learning algorithms on the data and plotted the results.
The first part of the report is simply text describing the introduction, application area and data to be used, machine learning algorithms to be used.
What I expect to see for the practical implementation part of the report are screenshots of your code in the RStudio script editor. Screenshots of key outputs and screenshots of important diagrams. Along with text to describe what I’m seeing and identify any salient points. The presentation of your practical work should be identical to the way I’ve presented the Activities in R over the last seven weeks. You need to use snipping tool in Windows or similar to grab screenshots of selected areas.
Finally, write up your work in a 1,500 word (+/- 10%) report
The report should include the following headings:
Report – (40 marks)
Introduction (10 marks)
Application area and data (10 marks)
Machine learning algorithms (10 marks)
Conclusion, structure of report, including refs (10 marks)
Practical Implementation – (60 marks)
Pre-processing on real or simulated data (10 marks)
R Programming content and your function (20 marks)
Display of data/results (20 marks)
Source code listing (10 marks)
State the R packages you have used, any source code you have used from others. Also, place a full R source listing at back of report – it will not add to word count but DO NOT go over page count of 15 pages
You can refer to any of your course handouts, any other books, journals, online resources etc.
Your introduction should include a summary of the main points that you will discuss in your report. Your report should outline the area your data is from and what you hope to achieve. Your introduction should be about 150 words in length.
2. Data used
The purpose of this section is to ensure you understand the types of data and the pre-processing you will use. What types of variables are present such as: integer, dates, strings, etc. Provide literature and examples associated with your data set. This section should be approximately 150 words.
3. Machine learning methods used
In this section you should identify the machine learning methods that you will apply to the UCI data. What criteria will be used to measure the success of the machine learning methods. This section should be approximately 150 words.
4. Practical: Pre-processing of data
In this section you should discuss how the data was read in, what pre-processing if any occurred and why you did it. Show me screen shots of code with your text write up. This section should be no more than 150 words in length.
5. Practical: R Programming content
In this section you should show me screen shots of code with your text write up. The R programming content can include building your machine learning models, testing of models, perhaps you have done a compare/contrast with several models. I would also like to see an R function written by you. The source code should be neat and tidy, use comments where necessary to explain the main actions of your code. This section should be no more than 300 words in length.
6. Practical: Display of data/results
This section you should use screenshots of key R output, important diagrams and anything to do with your machine learning models. Along with text descriptions of the outputs. It should be no more than 300 words in length.
7. Source code listing
This includes all your R code including the library commands. I expect to be able to load in the libraries you have used and copy and paste and run your analysis.
In this section you should summarise your experimental results and findings. This section should be approximately 150 words.
9. References and look and feel of report
These should be to Harvard standards (not included in work count but should be between 5-10 references). References should be valid and appropriate. The formatting of the report should be neat and tidy. Diagrams should be used with good descriptive text. Diagrams should be easy to read, and a sensible number of no more than 6-7 diagrams used. No more than 15 pages in total for everything including source code listings, put source code listing in font size 10.
The word counts for the sections are just advisory based on marks allocated.
Your report should be spell checked and contain references. You must use the Harvard style of referencing, both for citations within the text and your reference list. It is important that you read thoroughly the information on the cover sheet regarding the university assessment regulations, including those regarding plagiarism and collusion. Assignment hand-in requirements are specified on the front cover sheet. The approximate time you should spend on this assignment is 30-50 hours. Your assignment must be handed in before the time specified. Your assessment will be assessed according to the University’s Postgraduate Generic Assessment Criteria, which are provided on the following pages.