Tutorial 2 - The Basics in R

Introduction

Make sure you have access to or have installed R and RStudio prior to beginning this tutorial. There are details on how to do this correctly in tutorial 1. If following along on your own version of RStudio, this would be the time to open up a new script (and/or project) file, and save it in a logical place.

What are functions?

In tutorial 1, there were vague mentions of functions without any real explanation as to what they are. A function is a set of instructions for transforming a set of inputs into a set of outputs. The inputs are called ‘arguments’, and they can be anything (e.g., numbers, letters, files). The outputs are called ‘results’ or ‘values’, and these can also be anything (e.g., numbers, files, plots). Functions are typically used to store lengthy operations within a computer program, which have to be repeated several times on different input values.

In tutorial 1, you also learned about packages like dplyr. These are essentially containers for a series of functions. In R, there are a few packages which are loaded by default that contain the base functionality of the programming language. We will talk about some of those functions here.

Creating Simple Vectors in R

In order to do just about anything in R (aside from general arithmetic), you need some data to work with. We can create something called a vector, which is a collection of, in this case, things (numbers or strings of characters), using the c() function. The c here is may not seem incredibly informative but it stands for concatenate; to link things together in a chain.

Let’s have a go.

# try running these lines first 
c(1, 2, 3, 6)

c("elephant", "fish", "lion")

# try and make your own vector of numbers 

# the your own vector of strings 

# what happens if you combine vectors and strings?

These lines of code, if run in RStudio would run in the console but the outputs would not save anywhere. If you wish to access vector for use later on, you can assign them a name using the assignment arrow, <-, and they will be stored in your R session environment (in the Environment pane in RStudio).

Let’s practice assign object names here first.

# let's use our numerical vector from earlier 
vec1 <- c(1, 2, 3, 6)

You may notice that when this is run, there is no output… This is because to view the output, you need to tell R that is what you want to do. For smaller objects, like this vector, you simply need to run the name of the object you want to see. This draws the object from your environment, and prints it the console for you. We will look at how to deal with larger objects in a later tutorial.

vec1 <- c(1, 2, 3, 6)
vec1

It’s your turn!

Try assigning our vector of strings a name, along with any vectors you created, and then print these by calling the name.

Creating Numeric Vectors using `seq()`

The c() function in the previous section was a pretty simple one. All the inputs into the functions simply got joined together to form the output. Next we will look at another function seq() that is used to form sequence of numbers.

When learning a new function, the best thing to do if have a look at the help file.

help(seq)

Here, we can see what inputs the function requires, and example of how to use them. If using only the from = and to = arguments, e.g. seq(from = 0, to = 25), it will provide a vector of number from 0 to 25, increasing by 1 each step.

We will look at a couple of examples of how to use the from = and to = arguments in conjection with other inputs, namely, how to use the by = and/or the length.out = arguments. The others you can investigate on your own.

Using the `by =` argument

The by = argument specifies what the increment between values in the sequence should be. This much be numeric, but it is not restricted to integer values. Let’s see what this looks like in practise.

# integer increment
seq(from = 0, to = 6, by = 2)

# decimal increment
seq(from = 0, to = 1, by = 0.1)

# we could even assign this to an object in the environment 
myFirstSequence <- seq(from = 0, to = 6, by = 2)
myFirstSequence

# Try making some increment-specified sequences of your own!

Using the `length.out =` argument

The length.out = parameter indicate the length you would like the vector to be, that is, the number of elements you’d like it to have, and it will set the increment for you based on this. For example, if we want to create a vector containing 20 values between 0 and 1, we would run the following:

seq(from = 0, to = 1, length.out = 20)

Exercise

There are some other arguments you can use inside the seq() function, and some variations of the function as well, such as seq_len(). Have another look at the help file and practise using some of these other variations.

help(seq)

Operating on vectors

In the previous lesson we learned that R is essentially just a calculator, with some extra functionality, but we only looked at operations between single numerals. Now that we have learned about vector, we can investigate operations on these as well.

Let’s take a look at some examples…

# let's say we have two numeric vectors 
v <- c(5, 7.5, 4)
t <- c(9, 4, 3)

# we can add them 
v + t

# we can subtract them
v - t

# or multiply 
v*t 

# divide
v/t

# raise to any power, etc 
v^2

What you may notice here is that the vector does not change size, and the operations only occur between matching indices in the vector i.e. \(v + t = [5+9, \, 7.5+4, \, 4+3]\). These are known as element-wise operations, and are the default in R.

But what happened if the vectors are different lengths? Let’s have a look at this.

# let's say we have two numeric vectors, but this time v has length 4 
v <- c(5, 7.5, 4, 8)
t <- c(9, 4, 3)

# let's try adding these, what do you notice?  
v + t

The result is a vector of length 4, but the 4th element may not make a lot of sense. This is due to a functionality in R known as recycling. If one vector is shorter than the other, it gets reused until the result is the length of the longer vector. In this instance, our result is \(v + t = [5+9, \, 7.5 + 4, \, 4 + 3, \, 8 + 9]\).

Exercise

It’s now your turn to practice and investigate this feature.

# create some numeric vectors of the same length 


# Now try operating on them like we did above 




# Next, create two vector of different lengths 


# Experiment with the recycling property

Boolean Operators

The operators we have played with so far have all been arithmetic operators, but these are not the only kinds that exist within R. Another very important class of operators are known as Boolean operators, and these are primarily used as means of comparison between objects. There are two classes of Boolean operators, relational operators and logical operators. These kinds of operator will always return either a TRUE or FALSE value.

Relational operators are the usual mathematical comparison tools such as < (less than), > (greater than), <= (less than or equal), >= (greater than or equal), == (equal or equivalent), and != (not equal).

Logical operators are those which are grounded in computer logic, things such as, & or && (AND), | or || (OR), and ! (NOT). AND (& and &&) will return a TRUE value if and only if both sides of the expression are TRUE, and FALSE otherwise. On the other hand, the OR (| or ||) operator will return FALSE if and only if both sides of the expression are FALSE but TRUE otherwise. The difference between the single and double operators in what is compared. The single version (i.e. & and |) will compare vectors element-wise, but the double (&& and ||) will only compare the first element of each vector. The NOT (!) operator returns the opposite truth value to the input.

All of the relational operators can be used on just about any object in the R environment, but you just need to be careful with how you use them. For example, many things boil down to individual values, or vectors of individual values, so we will start here.

Individual values (number and/or strings) can be compared directly, and the results will always be a single TRUE or FALSE value. For vector, the operators work element-wise, just like the arithmetic operators, so you will see a vector of TRUE or FALSE values equal in length to the longest vector.

Exercise - Relational Operators

Let’s take a look at some relational operators in practice.

# individual values 
5 > 4
65 <= 65
6 + 8 == 2 + 4 
# which is the same as 
(6 + 8) == (2 + 4)
# we can compare strings as well!
"elephant" != "zebra"

## Try some for yourself...

Exercise - Logical Operators

Here I have created a few objects - a mixture of numeric, string and vector objects. Take some time to investigate various combinations of these with the logical operators listed above. Make sure you really spend some time here. Specifically note what happens when comparing anything to c.

Take note that when comparing numerical values using logical operators, any number greater than 1 is considered to be TRUE, and FALSE otherwise.

# numeric
a <- 9
# logical
b <- TRUE
# mixed vec 
c <- c(1, 4, TRUE, 'bird')
# numeric vec
d <- c(8, 6, 0.5, 5)

Important Base R Functions

Now that we have covered some basic ways to generate and compare data in R, let’s take a look at some built-in functions that we can use to extract more info from it.

The Boxplot Summaries

Suppose you have a sequence of values, that we have simulated and called nums.

nums
#>  [1] 55 51 59 65 60 61 52 60 54 63 55 48 52 57 47 56 51 51 50 62 59 51 59 47 51
#> [26] 54 51 61 50 55

There are some things we might like to know about our vector nums that aren’t clear just by looking at it. These might include, but are certainly not limited to, the length of the vector, the average (mean or median), other boxplot features such as the range and inter-quartile range, the minimum or maximum (if only interested in one or the other) and many others. We will take a look at some of these.

Many of these things are just computed by using the name of what you want as the function name, R is very convenient like that (sometimes!). For example, to compute the mean, you use mean() and for the median, you use median().

For the minimum and maximum, we use shortened names, min() and max(). And for the range, we simply use range().

The interquartile range has a couple of options. We can use the inbuilt IQR() function, or we can compute the two values manually using the quantile() function and take a difference, see below for an example.

# to use the built in IQR() we just need to input our vector 
IQR(nums)

# quantile requires a little more 
# we input the vector, and also the probability value we want
quantile(nums, probs = 0.25) # gives the value below which 25% of data falls

# lets name it lower 
lower <- quantile(nums, probs = 0.25) 

# we can computer the upper 75% in the same way 
upper <- quantile(nums, probs = 0.75)

# the IQR is then the difference between the two 
upper - lower

# the result is a named vector, we can remove this by writing 
as.numeric(upper - lower)

Now you have seen example of how to use two separate functions (names the IQR() and quantile() functions), you can try finding the mean(), median(), min(), max() and range() of our vector, nums, on your own. Remember, if you need assistance you can always use the help() function.

nums

# mean and median 



# min and max 



# and range

There are some other important built-in functions that don’t quite fit into the above category. These are functions for the standard deviation, sd(), summation, sum(), square root, sqrt(), and the exponential function, exp().

These work similarly to others we have seen here. Have a look at the help file for each function, and then try them out below, using nums as your vector where applicable.

Custom Functions in R with `function()`

function() allows for a new function to be defined in R. This is helpful especially when you want to run something any number of times with different values. Each function needs three things, a unique name, variables separated by a comma, and what we want the function to return. Each function will need to be set up in the following form.

 UniqueName <- function('variable1', 'variable2', ...){
  return('what we want the function to return')
}

Within this framework we can create many functions. It is important to remember that a function can be called in place of the output variable we will obtain from the function. This means if for example the function will output a number this function can be placed anywhere a number can be in our code, if the function will output a vector this function can be placed anywhere a vector can be.

Example 1

Let’s create a function to square each value we give it, first we need a unique name, lets go with ‘Squared’, we need a variable, lets go with ‘x’, and finally we need to know what the function is returning, in our case we want it to return our variable squared.

# Unique Name <- function('variable1'){
     Squared <- function(x){
# return('our variable squared')
 return(x*x)
     } # Don't forget to close the bracket

Once this code is run in RStudio, our function will become available in our global environment, top right side of RStudio. Though here in this tutorial, all you have to go by is the lack of any error message to know that it’s worked.

To call a function we need to use its unique name, followed by the variables it requires. So in order to use the ‘Squared’ function we just created to determine \(5^2\), \(6^2\), and \(10^2\)

# When x = 5
Squared(5)
# When x = 6
Squared(6)
# When x = 10
Squared(10)
# Try a couple yourself!

Example 2

Let’s now create a slightly more complex function. Let’s create a function that will work out the quadratic formula for us so we don’t have to do it manually. Our unique name for this function can be QuadraticForm, our variables will be a, b, and c, and we would like our function to return x in the quadratic formula where, \[ x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} \]. The quadratic formula garners two different values for x due to the \(\pm\) used. There are several ways we can approach this. The first is by defining the quadratic formula into two separate formulas, one when plus is used (xplus) and one when minus is used (xminus). Working out each of these can be done between the {} brackets and before the return() in our function as seen below. To return two bits of information we simply need to define it as a vector, mhich we have already practiced.

# Unique Name <- function('variables'){
 QuadraticForm <- function(a,b,c){
# Working out any outputs is done here
   xplus = (-b + sqrt(b^2 - 4 * a * c))/(2 * a)
   xminus = (-b - sqrt(b^2 - 4 * a * c))/(2 * a)
  #return(create a vector to return multiple bits of information)
   return(c(xplus,xminus))
 }# Don't forget to close the bracket

This function will return both x values for the quadratic equation, one for plus and one for minus. Calling this function happens in the same way as in the previous example also seen below.

# When a = -1, b = 0, c = 1
QuadraticForm(-1, 0, 1)

Test your knowledge

You can use the R window below to help find the answers to quiz questions.

Quiz

Introduction

What are functions?

Creating Simple Vectors in R

It’s your turn!

Creating Numeric Vectors using seq()

Using the by = argument

Using the length.out = argument

Exercise

Operating on vectors

Exercise

Boolean Operators

Exercise - Relational Operators

Exercise - Logical Operators

Important Base R Functions

The Boxplot Summaries

Custom Functions in R with function()

Example 1

Example 2

Test your knowledge

Tutorial 2 - The Basics in R

Creating Numeric Vectors using `seq()`

Using the `by =` argument

Using the `length.out =` argument

Custom Functions in R with `function()`