Consult this week’s lecture and possibly other materials you can find to discuss the topics of supervised, unsupervised learning and assessing the outcomes. Have a first taste of R.
Understand the basic principles of supervised and unsupervised learning and how to assess the outcome of classification.
Learn to use R as a tool for data analysis. R is versatile and offers many packages for very different types of data analysis. Knowing the basics will give you the ability to analyse data more efficiently than using simple tools like Excel
Answer the questions below.
This task should be completed in your 7th tutorial or the week after and submitted to
Canvas for feedback. It should be discussed and signed off in tutorial 8 or 9. This task should take no more than 1 hour to complete.
Discuss your answers with the tutorial instructor.
Get started on module 8.
Pass Task 7 — Submission Details and Assessment Criteria
Write down the questions and answers in a text, Latex or Word document and upload to Canvas as a PDF. Your tutor will give online feedback and discuss the tasks with you in the lab when they are complete.
Run the following commands in your RStudio (type the commands and press enter after every command):
x <- c (22,4,52,8,9,62,3,3,4)
x mean(x) sd(x) plot(x) hist(x)
Question 1. What do these commands do?
Question 2. What are standard deviation and histograms?
Think of ten of your friends. Use R to determine their average age and the standard deviation of the ages. Plot the ages as well.
Document the outcome in your answer sheet with a screenshot.
Find the irisdata.csv file in Canvas. Load it to your file system. Run this command in RStudio:
iris.data <- read.csv(file.choose())
This should open a file dialog where you can search for your irisdata.csv file. After loading the file, have a look at the content of the iris.data variable:
This dataset is a very old example dataset for classification. It is an annotated dataset – the answer (the actual class of the specimen) is given in the last column (species).
Run the following commands and study the outcome: names(iris.data) iris.data$sepal_length iris.data$species
class <- iris.data[1,5] class
Now you have studied how to choose headings and values from a table, and how to
assign parts of a table to a new variable.
Question 3. How do you create a new variable iris.subset and assign it the iris.data table without the annotation (without the species column)? Document the command you use.
Create a scatterplot of two columns of the Iris dataset: plot(iris.data$sepal_length ~ iris.data$petal_length)
Question 4. What do you think, is there a correlation between these columns? How strong do you think it is? Is it positive or negative?
Question 5. Try to plot other pairs of columns of the Iris dataset. Which pairs have the strongest (positive or negative) correlation, do you think?
Question 6. Run the command
(having ensured your answer to question 3 was correct). Study the output – does this support what you concluded in question 5?