School of Computing
Module Coordinator
Other lecturers
Date Issued
Code COSREP / M26538
Title Applied Machine Learning and Data Mining
Schedule and Deliverables
Item Value Format Deadline Late deadline
ECF deadline
Coursework
60% One report file (.pdf)
A single .zip file containing the python source codes (upload it to your github repository)
Notes and Advice
The Extenuating Circumstances procedure is there to support you if you have had any circumstances (problems) that have been serious or significant enough to prevent you from attending, completing or submitting an assessment on time.
ASDAC are available to any students who disclose a disability or require additional support for their academic studies with a good set of resources on the ASDAC moodle site
The University takes plagiarism seriously. Please ensure you adhere to the plagiarism guidelines. And watch the video on Plagiarism
Any material included in your coursework should be fully cited and referenced in APA 7 format Detailed advice on referencing is available from the library
Any material submitted that does not meet format or submission guidelines, or falls outside of the submission deadline could be subject to a cap on your overall result or disqualification entirely.
If you need additional assistance, you can ask your personal tutor, student engagement officer , academic tutor or your lecturers.
First Submission- Supervised Learning
Task I: Classification using Python
Download the following datasets, which reflect different Machine learning and data mining applications.
Medical:
1. Medical Data https://www.kaggle.com/dansbecker/hospital-readmissions
2. Heart attack predication https://www.kaggle.com/imnikhilanand/heart-attack-prediction
Finance:
1. Banking https://www.kaggle.com/janiobachmann/bank-marketing-dataset
2. Loan prediction https://www.kaggle.com/ninzaami/loan-predication
Earth and Nature:
1. Mushroom Classification https://www.kaggle.com/uciml/mushroom-classification
2. Weather https://www.kaggle.com/jsphyg/weather-dataset-rattle-package/download
Retail:
1. Online shoppers’ intention https://kaggle.com/roshansharma/online-shoppers-intention
2. Ecommerce data https://www.kaggle.com/carrie1/ecommerce-data
In addition, select an application of your choice, search for two different datasets using https://www.kaggle.com/datasets.
Task:
You are required to apply the following classification techniques using Python on all the datasets.
1. Decision tree
2. K-NN (with K taking the value of 1 up to the number of class labels in the dataset).
3. Naive Bayes
4. An algorithm of your choice.
Once you have applied the algorithms on all the datasets, it is required to accomplish the following tasks:
Compare the performance of the applied techniques in terms of accuracy.
Analyse the results with regards to the dataset properties.
You can use data exploratory techniques (visualisation) to explore the dataset and analyse the results.
Task II: Regression using python
Download the following datasets which reflect different Machine learning and data mining applications.
Social networks:
1. Facebook metrics https://archive.ics.uci.edu/ml/datasets/Facebook+metrics
Medical:
1. Fertility: https://archive.ics.uci.edu/ml/datasets/Fertility
In addition, select an application of your choice, search for a dataset using https://archive.ics.uci.edu/ml/index.php.
Task:
You are required to apply the following on all the datasets using python:
1. Linear Regression.
2. An algorithm of your choice.
Once you have applied the algorithm on all the datasets, it is required to accomplish the following tasks:
Compare the performance of the applied techniques.
Analyse the results with regards to the dataset properties.
You can use data exploratory techniques (visualisation) to explore the dataset and analyse the results.
Deliverables of the components of the coursework are:
A report documenting Task I and Task II in no more than 1500 words excluding figures and tables. Your report must cover the following areas:
A short summary of the machine learning applications and datasets you used and a justification of the chosen datasets.
A detailed analysis of your results when comparing the different classification techniques.
A detailed analysis of your results when comparing the different regression techniques.
The submission is online through Moodle (the submission details will be available on Moodle).
Please ensure that your coursework is anonymous. Your NAME must not appear anywhere on the coursework or the cover sheet. Please use your ID only
This component of your coursework contributes 60% of the total mark. The marking criteria [in 100% breakup of marks] for this component are as follows:
20% Justification of choice and the datasets used
20% Appropriate use of tables and figures when reporting the results
30% Analysis of the results of the experiments you have conducted
10% Conclusion with recommendations on how to match a dataset to a technique
10% Organisation, language style and clarity
10% Python Code