# COS80023 - Task 7: Machine Learning

### Overview

Consult this week’s lecture and possibly other materials you can find to discuss the topics of supervised, unsupervised learning and assessing the outcomes. Have a first taste of R.

Purpose

Understand the basic principles of supervised and unsupervised learning and how to assess the outcome of classification.

Learn to use R as a tool for data analysis. R is versatile and offers many packages for very different types of data analysis. Knowing the basics will give you the ability to analyse data more efficiently than using simple tools like Excel

Time

This task should be completed in your 7th tutorial or the week after and submitted to

Canvas for feedback. It should be discussed and signed off in tutorial 8 or 9. This task should take no more than 1 hour to complete.

Resources

• Any other material you find

Feedback

Next

Get started on module 8.

Pass Task 7 — Submission Details and Assessment Criteria

Write down the questions and answers in a text, Latex or Word document and upload to Canvas as a PDF. Your tutor will give online feedback and discuss the tasks with you in the lab when they are complete.

1. What is the difference between supervised and unsupervised learning? When would you use them? Give an Use no more than 15 sentences.
2. What is overfitting, and how do you detect it? Use no more than 10
3. How does cross-validation help with overfitting? Explain the principle of Use no more than 15sentences.
4. What different aspects of classification quality do sensitivity, specificity and accuracy measure? Why do we need several measures for classification quality? Use no more than 10

### Exercise 1

Run the following commands in your RStudio (type the commands and press enter after every command):

x <- c (22,4,52,8,9,62,3,3,4)

x mean(x) sd(x) plot(x) hist(x)

Question 1. What do these commands do?

Question 2. What are standard deviation and histograms?

### Exercise 2

Think of ten of your friends. Use R to determine their average age and the standard deviation of the ages. Plot the ages as well.

### Exercise 3

Find the irisdata.csv file in Canvas. Load it to your file system. Run this command in RStudio:

This should open a file dialog where you can search for your irisdata.csv file. After loading the file, have a look at the content of the iris.data variable:

iris.data

This dataset is a very old example dataset for classification. It is an annotated dataset – the answer (the actual class of the specimen) is given in the last column (species).

Run the following commands and study the outcome: names(iris.data) iris.data\$sepal_length iris.data\$species

iris.data[1,1] iris.data[1:3,1]

iris.data[1:3,]

class <- iris.data[1,5] class

Now you have studied how to choose headings and values from a table, and how to

assign parts of a table to a new variable.

Question 3. How do you create a new variable iris.subset and assign it the iris.data table without the annotation (without the species column)? Document the command you use.

### Exercise 4

Create a scatterplot of two columns of the Iris dataset: plot(iris.data\$sepal_length ~ iris.data\$petal_length)

Question 4. What do you think, is there a correlation between these columns? How strong do you think it is? Is it positive or negative?

Question 5. Try to plot other pairs of columns of the Iris dataset. Which pairs have the strongest (positive or negative) correlation, do you think?

Question 6. Run the command

cor(iris.subset)

(having ensured your answer to question 3 was correct). Study the output – does this support what you concluded in question 5?

Hire Expert

Get a Professional Help

200
Select FileChangeRemove

TOP

Limited Time Offer! - 20% OFF on all Services Get Expert Assistance Today!

X