Tutorial 2 - The Basics in R
Introduction
Make sure you have access to or have installed R and RStudio prior to beginning this tutorial. There are details on how to do this correctly in tutorial 1. If following along on your own version of RStudio, this would be the time to open up a new script (and/or project) file, and save it in a logical place.
What are functions?
In tutorial 1, there were vague mentions of functions without any real explanation as to what they are. A function is a set of instructions for transforming a set of inputs into a set of outputs. The inputs are called ‘arguments’, and they can be anything (e.g., numbers, letters, files). The outputs are called ‘results’ or ‘values’, and these can also be anything (e.g., numbers, files, plots). Functions are typically used to store lengthy operations within a computer program, which have to be repeated several times on different input values.
In tutorial 1, you also learned about packages like
dplyr. These are essentially containers for a series of
functions. In R, there are a few packages which are loaded by default
that contain the base functionality of the programming language. We will
talk about some of those functions here.
Creating Simple Vectors in R
In order to do just about anything in R (aside from general
arithmetic), you need some data to work with. We can create something
called a vector, which is a collection of, in this case, things (numbers
or strings of characters), using the c() function. The
c here is may not seem incredibly informative but it stands
for concatenate; to link things together in a chain.
Let’s have a go.
# try running these lines first
c(1, 2, 3, 6)
c("elephant", "fish", "lion")
# try and make your own vector of numbers
# the your own vector of strings
# what happens if you combine vectors and strings?
These lines of code, if run in RStudio would run in the console but
the outputs would not save anywhere. If you wish to access vector for
use later on, you can assign them a name using the assignment arrow,
<-, and they will be stored in your R session
environment (in the Environment pane in RStudio).
Let’s practice assign object names here first.
# let's use our numerical vector from earlier
vec1 <- c(1, 2, 3, 6)
You may notice that when this is run, there is no output… This is because to view the output, you need to tell R that is what you want to do. For smaller objects, like this vector, you simply need to run the name of the object you want to see. This draws the object from your environment, and prints it the console for you. We will look at how to deal with larger objects in a later tutorial.
vec1 <- c(1, 2, 3, 6)
vec1
It’s your turn!
Try assigning our vector of strings a name, along with any vectors you created, and then print these by calling the name.
Creating Numeric Vectors using seq()
The c() function in the previous section was a pretty
simple one. All the inputs into the functions simply got joined together
to form the output. Next we will look at another function
seq() that is used to form sequence of numbers.
When learning a new function, the best thing to do if have a look at the help file.
help(seq)
Here, we can see what inputs the function requires, and example of
how to use them. If using only the from = and
to = arguments, e.g. seq(from = 0, to = 25),
it will provide a vector of number from 0 to 25, increasing by 1 each
step.
We will look at a couple of examples of how to use the
from = and to = arguments in conjection with
other inputs, namely, how to use the by = and/or the
length.out = arguments. The others you can investigate on
your own.
Using the by = argument
The by = argument specifies what the increment between
values in the sequence should be. This much be numeric, but it is not
restricted to integer values. Let’s see what this looks like in
practise.
# integer increment
seq(from = 0, to = 6, by = 2)
# decimal increment
seq(from = 0, to = 1, by = 0.1)
# we could even assign this to an object in the environment
myFirstSequence <- seq(from = 0, to = 6, by = 2)
myFirstSequence
# Try making some increment-specified sequences of your own!
Using the length.out = argument
The length.out = parameter indicate the length you would
like the vector to be, that is, the number of elements you’d like it to
have, and it will set the increment for you based on this. For example,
if we want to create a vector containing 20 values between 0 and 1, we
would run the following:
seq(from = 0, to = 1, length.out = 20)
Exercise
There are some other arguments you can use inside the
seq() function, and some variations of the function as
well, such as seq_len(). Have another look at the help file
and practise using some of these other variations.
help(seq)
Operating on vectors
In the previous lesson we learned that R is essentially just a calculator, with some extra functionality, but we only looked at operations between single numerals. Now that we have learned about vector, we can investigate operations on these as well.
Let’s take a look at some examples…
# let's say we have two numeric vectors
v <- c(5, 7.5, 4)
t <- c(9, 4, 3)
# we can add them
v + t
# we can subtract them
v - t
# or multiply
v*t
# divide
v/t
# raise to any power, etc
v^2
What you may notice here is that the vector does not change size, and the operations only occur between matching indices in the vector i.e. \(v + t = [5+9, \, 7.5+4, \, 4+3]\). These are known as element-wise operations, and are the default in R.
But what happened if the vectors are different lengths? Let’s have a look at this.
# let's say we have two numeric vectors, but this time v has length 4
v <- c(5, 7.5, 4, 8)
t <- c(9, 4, 3)
# let's try adding these, what do you notice?
v + t
The result is a vector of length 4, but the 4th element may not make a lot of sense. This is due to a functionality in R known as recycling. If one vector is shorter than the other, it gets reused until the result is the length of the longer vector. In this instance, our result is \(v + t = [5+9, \, 7.5 + 4, \, 4 + 3, \, 8 + 9]\).
Exercise
It’s now your turn to practice and investigate this feature.
# create some numeric vectors of the same length
# Now try operating on them like we did above
# Next, create two vector of different lengths
# Experiment with the recycling property
Boolean Operators
The operators we have played with so far have all been arithmetic
operators, but these are not the only kinds that exist within R. Another
very important class of operators are known as Boolean operators, and
these are primarily used as means of comparison between objects. There
are two classes of Boolean operators, relational operators and logical
operators. These kinds of operator will always return either a
TRUE or FALSE value.
Relational operators are the usual mathematical comparison tools such
as < (less than), > (greater than),
<= (less than or equal), >= (greater
than or equal), == (equal or equivalent), and
!= (not equal).
Logical operators are those which are grounded in computer logic,
things such as, & or && (AND),
| or || (OR), and ! (NOT). AND
(& and &&) will return a
TRUE value if and only if both sides of the expression are
TRUE, and FALSE otherwise. On the other hand, the OR
(| or ||) operator will return
FALSE if and only if both sides of the expression are
FALSE but TRUE otherwise. The difference
between the single and double operators in what is compared. The single
version (i.e. & and |) will compare
vectors element-wise, but the double (&& and
||) will only compare the first element of each vector. The
NOT (!) operator returns the opposite truth value to the
input.
All of the relational operators can be used on just about any object in the R environment, but you just need to be careful with how you use them. For example, many things boil down to individual values, or vectors of individual values, so we will start here.
Individual values (number and/or strings) can be compared directly,
and the results will always be a single TRUE or
FALSE value. For vector, the operators work element-wise,
just like the arithmetic operators, so you will see a vector of
TRUE or FALSE values equal in length to the
longest vector.
Exercise - Relational Operators
Let’s take a look at some relational operators in practice.
# individual values
5 > 4
65 <= 65
6 + 8 == 2 + 4
# which is the same as
(6 + 8) == (2 + 4)
# we can compare strings as well!
"elephant" != "zebra"
## Try some for yourself...
Exercise - Logical Operators
Here I have created a few objects - a mixture of numeric, string and
vector objects. Take some time to investigate various combinations of
these with the logical operators listed above. Make sure you really
spend some time here. Specifically note what happens when comparing
anything to c.
Take note that when comparing numerical values using logical
operators, any number greater than 1 is considered to be
TRUE, and FALSE otherwise.
# numeric
a <- 9
# logical
b <- TRUE
# mixed vec
c <- c(1, 4, TRUE, 'bird')
# numeric vec
d <- c(8, 6, 0.5, 5)
Important Base R Functions
Now that we have covered some basic ways to generate and compare data in R, let’s take a look at some built-in functions that we can use to extract more info from it.
The Boxplot Summaries
Suppose you have a sequence of values, that we have simulated and
called nums.
nums
#> [1] 55 51 59 65 60 61 52 60 54 63 55 48 52 57 47 56 51 51 50 62 59 51 59 47 51
#> [26] 54 51 61 50 55
There are some things we might like to know about our vector
nums that aren’t clear just by looking at it. These might
include, but are certainly not limited to, the length of the vector, the
average (mean or median), other boxplot features such as the range and
inter-quartile range, the minimum or maximum (if only interested in one
or the other) and many others. We will take a look at some of these.
Many of these things are just computed by using the name of what you
want as the function name, R is very convenient like that (sometimes!).
For example, to compute the mean, you use mean() and for
the median, you use median().
For the minimum and maximum, we use shortened names,
min() and max(). And for the range, we simply
use range().
The interquartile range has a couple of options. We can use the
inbuilt IQR() function, or we can compute the two values
manually using the quantile() function and take a
difference, see below for an example.
# to use the built in IQR() we just need to input our vector
IQR(nums)
# quantile requires a little more
# we input the vector, and also the probability value we want
quantile(nums, probs = 0.25) # gives the value below which 25% of data falls
# lets name it lower
lower <- quantile(nums, probs = 0.25)
# we can computer the upper 75% in the same way
upper <- quantile(nums, probs = 0.75)
# the IQR is then the difference between the two
upper - lower
# the result is a named vector, we can remove this by writing
as.numeric(upper - lower)
Now you have seen example of how to use two separate functions (names
the IQR() and quantile() functions), you can
try finding the mean(), median(),
min(), max() and range() of our
vector, nums, on your own. Remember, if you need assistance
you can always use the help() function.
nums
# mean and median
# min and max
# and range
There are some other important built-in functions that don’t quite
fit into the above category. These are functions for the standard
deviation, sd(), summation, sum(), square
root, sqrt(), and the exponential function,
exp().
These work similarly to others we have seen here. Have a look at the help file for each function, and then try them out below, using nums as your vector where applicable.
Custom Functions in R with function()
function() allows for a new function to be defined in R.
This is helpful especially when you want to run something any number of
times with different values. Each function needs three things, a unique
name, variables separated by a comma, and what we want the function to
return. Each function will need to be set up in the following form.
UniqueName <- function('variable1', 'variable2', ...){
return('what we want the function to return')
}
Within this framework we can create many functions. It is important to remember that a function can be called in place of the output variable we will obtain from the function. This means if for example the function will output a number this function can be placed anywhere a number can be in our code, if the function will output a vector this function can be placed anywhere a vector can be.
Example 1
Let’s create a function to square each value we give it, first we need a unique name, lets go with ‘Squared’, we need a variable, lets go with ‘x’, and finally we need to know what the function is returning, in our case we want it to return our variable squared.
# Unique Name <- function('variable1'){
Squared <- function(x){
# return('our variable squared')
return(x*x)
} # Don't forget to close the bracket
Once this code is run in RStudio, our function will become available in our global environment, top right side of RStudio. Though here in this tutorial, all you have to go by is the lack of any error message to know that it’s worked.
To call a function we need to use its unique name, followed by the variables it requires. So in order to use the ‘Squared’ function we just created to determine \(5^2\), \(6^2\), and \(10^2\)
# When x = 5
Squared(5)
# When x = 6
Squared(6)
# When x = 10
Squared(10)
# Try a couple yourself!
Example 2
Let’s now create a slightly more complex function. Let’s create a
function that will work out the quadratic formula for us so we don’t
have to do it manually. Our unique name for this function can be
QuadraticForm, our variables will be a,
b, and c, and we would like our function to
return x in the quadratic formula where, \[ x
= \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} \]. The quadratic formula
garners two different values for x due to the \(\pm\) used. There are several ways we can
approach this. The first is by defining the quadratic formula into two
separate formulas, one when plus is used (xplus) and one
when minus is used (xminus). Working out each of these can
be done between the {} brackets and before the
return() in our function as seen below. To return two bits
of information we simply need to define it as a vector, mhich we have
already practiced.
# Unique Name <- function('variables'){
QuadraticForm <- function(a,b,c){
# Working out any outputs is done here
xplus = (-b + sqrt(b^2 - 4 * a * c))/(2 * a)
xminus = (-b - sqrt(b^2 - 4 * a * c))/(2 * a)
#return(create a vector to return multiple bits of information)
return(c(xplus,xminus))
}# Don't forget to close the bracket
This function will return both x values for the quadratic equation, one for plus and one for minus. Calling this function happens in the same way as in the previous example also seen below.
# When a = -1, b = 0, c = 1
QuadraticForm(-1, 0, 1)
Test your knowledge
You can use the R window below to help find the answers to quiz questions.