
Installing R and RStudio
To get started with R, download the latest version of R for your operating system from CRAN. Once this is installed, download and install RStudio. RStudio will automatically detect where R is installed and set itself up so that you can do all your R computation in RStudio.
R and RStudio are free, open source and cross-platform, so whether you’re using Windows, OS X or a Linux distribution, you should be able to install and use both pieces of software.
What are all these windows for?
When you open RStudio, you will see a part of the screen with tabs labelled
- Files - files in the current working directory
- Plots - plots you make
- Packages - packages available to you to load
- Help - the help browser
- Viewer - for viewing data objects
You’ll see another section with
- Environment - everything you’ve defined/stored is shown here
- History - every command you’ve run in the current R session is shown here
Yet another section is labelled “Console”, this is where commands get run
And the final section is a script editor that looks and behaves like a text editor (you’ll learn more about this in your first computer lab).
R as a Calculator
R is essentially just a really powerful calculator, kind of like those you may have been required to purchase for high school maths, only better and with a much higher limit on what it can achieve.
Doing computation in R involves writing code. We can type directly into the Console and run code and it will give us an answer like a calculator.
Example
We can test what the might look like right here in this tutorial. In the window below, you will see a series of expression (simple arithmetic, mostly) typed out.
By clicking “Run Code”, you will see R give you an answer for each line.
1 + 1
4 * 5
(2+5)^2 # the ^ symbol raised the number in front to a power, here 2
Here, we had three lines of code, and so we ended up with three answers. You’ll also notice the use of the # symbol… This is a comment. Comments are not executed as code, but rather exist to give the user some extra information. It is good coding practice to include comments wherever convenient.
Exercise
Now you try!
See if you can work out how to divide 50 by 4, then compute \(14 \times (4+6)^2\).. Don’t forget to hit “Run Code” to see your answers.
# How do you divide in programs like Excel or Sheets?
What we have seen here in the exercises is exactly how you run code inside the console in RStudio. Take some time now to open your recently installed RStudio and practice writing code inside the console. Here there is a run code button, but you can also simply hit “Enter” on your keyboard to run.
Script Files
In RStudio, you cannot actually save the code you enter into the console. If you want to save a copy of the code that you write, you need to write your code into a script file. There are a few ways of doing this.
The first is using the icon in the top left of your RStudio window
that looks like a piece of paper with a green plus sign on it. You click
this and then select R script. Another option is to open a new script
via the file menu. And finally, you can use a keyboard shortcut. On
MacOS this is Cmd + Shift + N
, and
Ctrl + Shift + N
on Windows.
Code can be run directly from the script file by
Cmd + Enter
(MacOS) or Ctrl + Enter
(Windows),
and it can be saved in the usual way i.e. File > Save As.
Setting Your Working Directory
The Working Directory is the folder that R looks for any
files you are going to try to load in (e.g., .csv
files).
Unless you set your working directory appropriately, R does not know
where to find the files.
Setting your working directory can be done by going to Session -> Set Working Directory -> Choose Directory, or by simply pressing Ctrl+Shift+H (on both Windows and Mac). This opens a window that allows you to navigate to and select the folder your data files and script are saved in.
You can check your current working directory with the
getwd()
function
getwd() #This function does not require any arguments, it can be typed as is
It is best you test this one on your own machine and you should set your working directory at the beginning of any new R session.
R Projects
In RStudio there is a such a thing as an .Rproj
file. If
you are working on multiple projects requiring R (i.e. multiple
assessment items) it may be a good idea to use R Projects to keep track
of your work.
You can create an R Project via the File menu, and it will generate
an .Rproj
file in the folder of your choosing. This folder
is then automatically set as your working directory each time you open
the project file.
But what is the benefit of this?
As if auto-setting of the working directory was not enough, R projects also save your untitled script files, and your whole working environment so that next time you want to work on this task, you don’t need to re-run all of your code, or worry* about losng all of your work. It allows you to pick up where you left off, and also keep all your tasks separate.
The only thing you will need to re-run is the loading of any packages you need. You might be wondering “what is a package?”
Click for the next topic to find out.
* As much. It is still possible to lose work if R is force quit.
Packages in R
First, Some HistoRy
R is a software package for statistical computation. It was first released in 1993 and was developed by researchers at the University of Auckland as an open source implementation of the S language, developed at Bell Labs and first released in 1976.
In 2011, the first release of RStudio became available, providing a modern graphical user interface (GUI) that was freely available to R users who were sick of using the default R interface (which hasn’t really been updated since 1993, when Windows 3.1 was the latest version of Windows).
Since 1993, many people have developed novel ways of getting R to do things such as visualisation, modelling, interfacing with other software, and reading and writing both common and obscure data file formats, by writing functions that manipulate objects. These functions are bundled up as R packages which are available to install within R from the Comprehensive R Archive Network (CRAN).
More on Packages
A package is a collection of functions that extend the functionality of R; they have been written and contributed back to the community for free by other R users around the world. A list of all available packages can be found at CRAN (mentioned above).
Many of the packages used in subjects throughout your degree are
designed to work together as part of a “tidy data” workflow. These
include packages for reading data in (readr
), manipulating
and reshaping it (dplyr
, tidyr
) and plotting
it (ggplot2
). Many common R functions that we’ll use in
this tutorial series are also included in the base
and
stats
packages, which are installed by default.
Throughout this tutorial series, if a function outside of the
base
or stats
packages is used, we will
specify the package from which it came like so:
dplyr::select()
. Here dplyr
is the package,
and select()
is the function we are using. In general, this
specification is not necessary unless there are conflicts between
packages, however, in the learning stage this is a helpful tool for
familiarisation.
Loading a package
To load a package from the library of installed packages, we use the
library()
function. For example, to load in the
dplyr
package (mentioned above to contain a
select()
function, among others) we use
library(dplyr)
However, the dplyr
package is part of a larger framework
known as the tidyverse
, and there is actually a package
called tidyverse
which, when loaded, loads
dplyr
as well as a whole collection of other packages, all
containing useful functions that will be used throughout this tutorial
series. The tidyverse, as of November 2021, contains:
ggplot2
- for visualisationdplyr
andtidyr
- for data manipulation and cleaningreadr
- for importing various data types into Rpurrr
- for applying pre-existing functions to dataframes and other data typestibble
- for working with nicely formatted data tablesstringr
- for manipulating and/or searching within stringsforcats
- for dealing with factor variables
If you run library()
with no package name inside the
brackets, R will show you a list of all the packages installed and a
description of what they do.
Note: Not all packages print info to the console when loading. It may
look as though it didn’t work, but be sure that unless you see Error in ...
in the console,
you code has run (though, functional does not always mean correct when
it comes to code!).
Installing a package
One example of an error you might see when loading packages is the following:
# Error in library(tidyverse) : there is no package called 'tidyverse'
Then it is telling you that you don’t currently have it installed.
First, check that the name of the package has been correctly typed
(e.g., that you’ve not accidentally typed
library(tidyeerse)
) and if you need to install a package
you can use
install.packages("tidyverse")
We have to put the name of the package in quotes here because
install.packages()
wants a collection of strings of text
corresponding to the names of packages to install.
Once this line of code has been run, RStudio might ask you to select
a mirror to use to download packages. There are many web servers around
the world which are mirrors (like a clone) of the main CRAN web server.
These mirrors help ensure that people can connect to a web server as
close as possible to their city so they can install packages as quickly
as possible. There’s a mirror in Canberra and one in Melbourne, so
choose one of these to use. R will now download the relevant package,
install it for you and make it available to you with the
library()
function.
You’ll only need to install a package once on your home computer, and the QUT computer lab computers should have everything already installed.
Alternative Methods
Alternatively, you can load and install packages by going to the “Packages” tab (there should be five tabs in part of the screen, labelled Files, Plots, Packages, Help, Viewer). You can scroll down the list to see what’s installed, and check the corresponding check box to load it.
If you want to install something that’s not there, click the “Install” button just underneath the tab, make sure “Install from:” has “Repository (CRAN, CRANextra)” selected, and type the name of the package or packages you want to install, make sure “Install dependencies” is ticked and then click “Install”. RStudio will then connect to CRAN and download the package(s) you’ve asked for as well as any packages that those packages require to run (as some packages are so good that other package authors use functions from those packages to build their own packages).
You can also reach this package installing window by opening the “Tools” menu from the menu bar at the top of the RStudio window (or the common menu bar on OS X) and selecting “Install packages”.
Other packages
There are packages which exist outside of CRAN (which has very high
standards for documentation and error checking). While we won’t be using
any of these this semester, websites like GitHub are used by researchers around the
world to do things like share R code with each other that is useful but
not yet ready for publication at CRAN. For example, the hrbrmstr/hrbrthemes
package contains alternative themes for use with
ggplot2
.
Getting help in R
If you find yourself forgetting what a particular function does or
what the names of the arguments (inputs) you can pass it are, you can
use the help()
function. For example, try running the
following:
help(mean)
What you notice is that it prints the help info about calculating the
arithmetic mean, where we are told that it requires an R object
x
which is either a numeric vector, logical vector or date,
date-time or time interval, and we can optionally tell it how much of
the data to trim and whether or not we want drop any missing values from
the calculation.
If you want help with a function from a particular package, you can
specify this using something like ??dplyr::select()
. Here,
dplyr
is the package, ??
is the equivalent of
help
and select()
is the function we want help
with. Try this for yourself now. We have listed a few package specific
functions for you to investigate.
??dplyr::select
??ggplot2::geom_point
When we do this, it show us info on all the functions within the
dplyr
and ggplot
packages whose function
description contains the word select
or phrase
geom_point
, respectively, and prints a brief description.
In your RStudio window, this will look a little different. Jump across
now and try these out. You should notice that the list pops up in the
help window, and then you can click to choose the function you wish to
know more info about.