COS80023 - Task 7: Machine Learning


Consult this week’s lecture and possibly other materials you can find to discuss the topics of supervised, unsupervised learning and assessing the outcomes. Have a first taste of R.


Understand the basic principles of supervised and unsupervised learning and how to assess the outcome of classification.

Learn to use R as a tool for data analysis. R is versatile and offers many packages for very different types of data analysis. Knowing the basics will give you the ability to analyse data more efficiently than using simple tools like Excel


Answer the questions below.


This task should be completed in your 7th tutorial or the week after and submitted to

Canvas for feedback. It should be discussed and signed off in tutorial 8 or 9. This task should take no more than 1 hour to complete.


=Blended&adaptor=Local%20Search%20Engine&tab=combined&query=any,cont ains,r%20for%20beginners&offset=0

  • Any other material you find


Discuss your answers with the tutorial instructor.


Get started on module 8.

Pass Task 7 — Submission Details and Assessment Criteria

Write down the questions and answers in a text, Latex or Word document and upload to Canvas as a PDF. Your tutor will give online feedback and discuss the tasks with you in the lab when they are complete.

Subtask 7.1

  1. What is the difference between supervised and unsupervised learning? When would you use them? Give an Use no more than 15 sentences.
  2. What is overfitting, and how do you detect it? Use no more than 10
  3. How does cross-validation help with overfitting? Explain the principle of Use no more than 15sentences.
  4. What different aspects of classification quality do sensitivity, specificity and accuracy measure? Why do we need several measures for classification quality? Use no more than 10

Subtask 7.2

Exercise 1

Run the following commands in your RStudio (type the commands and press enter after every command):

x <- c (22,4,52,8,9,62,3,3,4)

x mean(x) sd(x) plot(x) hist(x)

Question 1. What do these commands do?

Question 2. What are standard deviation and histograms?

Exercise 2

Think of ten of your friends. Use R to determine their average age and the standard deviation of the ages. Plot the ages as well.

Document the outcome in your answer sheet with a screenshot.

Exercise 3

Find the irisdata.csv file in Canvas. Load it to your file system. Run this command in RStudio: <- read.csv(file.choose())

This should open a file dialog where you can search for your irisdata.csv file. After loading the file, have a look at the content of the variable:

This dataset is a very old example dataset for classification. It is an annotated dataset – the answer (the actual class of the specimen) is given in the last column (species).

Run the following commands and study the outcome: names($sepal_length$species[1,1][1:3,1][1:3,]

class <-[1,5] class

Now you have studied how to choose headings and values from a table, and how to

assign parts of a table to a new variable.

Question 3. How do you create a new variable iris.subset and assign it the table without the annotation (without the species column)? Document the command you use.

Exercise 4

Create a scatterplot of two columns of the Iris dataset: plot($sepal_length ~$petal_length)

Question 4. What do you think, is there a correlation between these columns? How strong do you think it is? Is it positive or negative?

Question 5. Try to plot other pairs of columns of the Iris dataset. Which pairs have the strongest (positive or negative) correlation, do you think?

Question 6. Run the command


(having ensured your answer to question 3 was correct). Study the output – does this support what you concluded in question 5?

Expert's Answer


Hire Expert 

Get a Professional Help

Select FileChangeRemove


Limited Time Offer! - 20% OFF on all Services Get Expert Assistance Today!