School of Computing

Module Coordinator
Other lecturers
Date Issued
Code COSREP / M26538
Title Applied Machine Learning and Data Mining

Schedule and Deliverables
Item Value Format Deadline Late deadline
ECF deadline
Coursework
60% One report file (.pdf)
A single .zip file containing the python source codes (upload it to your github repository)

Notes and Advice
 The Extenuating Circumstances procedure is there to support you if you have had any circumstances (problems) that have been serious or significant enough to prevent you from attending, completing or submitting an assessment on time.
 ASDAC are available to any students who disclose a disability or require additional support for their academic studies with a good set of resources on the ASDAC moodle site
 The University takes plagiarism seriously. Please ensure you adhere to the plagiarism guidelines. And watch the video on Plagiarism
 Any material included in your coursework should be fully cited and referenced in APA 7 format Detailed advice on referencing is available from the library
 Any material submitted that does not meet format or submission guidelines, or falls outside of the submission deadline could be subject to a cap on your overall result or disqualification entirely.
 If you need additional assistance, you can ask your personal tutor, student engagement officer , academic tutor or your lecturers.

First Submission- Supervised Learning

Task I: Classification using Python

Download the following datasets, which reflect different Machine learning and data mining applications.

Medical:
1. Medical Data https://www.kaggle.com/dansbecker/hospital-readmissions
2. Heart attack predication https://www.kaggle.com/imnikhilanand/heart-attack-prediction

Finance:
1. Banking https://www.kaggle.com/janiobachmann/bank-marketing-dataset
2. Loan prediction https://www.kaggle.com/ninzaami/loan-predication

Earth and Nature:
1. Mushroom Classification https://www.kaggle.com/uciml/mushroom-classification
2. Weather https://www.kaggle.com/jsphyg/weather-dataset-rattle-package/download

Retail:
1. Online shoppers’ intention https://kaggle.com/roshansharma/online-shoppers-intention
2. Ecommerce data https://www.kaggle.com/carrie1/ecommerce-data

In addition, select an application of your choice, search for two different datasets using https://www.kaggle.com/datasets.

Task:
You are required to apply the following classification techniques using Python on all the datasets.
1. Decision tree
2. K-NN (with K taking the value of 1 up to the number of class labels in the dataset).
3. Naive Bayes
4. An algorithm of your choice.
Once you have applied the algorithms on all the datasets, it is required to accomplish the following tasks:
 Compare the performance of the applied techniques in terms of accuracy.
 Analyse the results with regards to the dataset properties.
 You can use data exploratory techniques (visualisation) to explore the dataset and analyse the results.

Task II: Regression using python

Download the following datasets which reflect different Machine learning and data mining applications.
Social networks:
1. Facebook metrics https://archive.ics.uci.edu/ml/datasets/Facebook+metrics
Medical:
1. Fertility: https://archive.ics.uci.edu/ml/datasets/Fertility
In addition, select an application of your choice, search for a dataset using https://archive.ics.uci.edu/ml/index.php.
Task:
You are required to apply the following on all the datasets using python:
1. Linear Regression.
2. An algorithm of your choice.
Once you have applied the algorithm on all the datasets, it is required to accomplish the following tasks:
 Compare the performance of the applied techniques.
 Analyse the results with regards to the dataset properties.
 You can use data exploratory techniques (visualisation) to explore the dataset and analyse the results.
Deliverables of the components of the coursework are:
 A report documenting Task I and Task II in no more than 1500 words excluding figures and tables. Your report must cover the following areas:
 A short summary of the machine learning applications and datasets you used and a justification of the chosen datasets.
 A detailed analysis of your results when comparing the different classification techniques.
 A detailed analysis of your results when comparing the different regression techniques.
The submission is online through Moodle (the submission details will be available on Moodle).
Please ensure that your coursework is anonymous. Your NAME must not appear anywhere on the coursework or the cover sheet. Please use your ID only
This component of your coursework contributes 60% of the total mark. The marking criteria [in 100% breakup of marks] for this component are as follows:
20% Justification of choice and the datasets used
20% Appropriate use of tables and figures when reporting the results
30% Analysis of the results of the experiments you have conducted
10% Conclusion with recommendations on how to match a dataset to a technique
10% Organisation, language style and clarity
10% Python Code

Essay Mill

Share
Published by
Essay Mill

Recent Posts

Childbirth

For this short paper activity, you will learn about the three delays model, which explains…

1 month ago

Literature

 This is a short essay that compares a common theme or motif in two works…

1 month ago

Hospital Adult Medical Surgical Collaboration Area

Topic : Hospital adult medical surgical collaboration area a. Current Menu Analysis (5 points/5%) Analyze…

1 month ago

Predictive and Qualitative Analysis Report

As a sales manager, you will use statistical methods to support actionable business decisions for Pastas R Us,…

1 month ago

Business Intelligence

Read the business intelligence articles: Getting to Know the World of Business Intelligence Business intelligence…

1 month ago

Alcohol Abuse

The behaviors of a population can put it at risk for specific health conditions. Studies…

1 month ago