5 R Objects You Should Learn to Master R Programming
These objects are an integral part of every R coder’s toolkit - in this article we are going to skim through them and learn their properties.
R is an awesome language. Earning its top spot as one of the most popular programming languages, this language is a great start for beginners to enter the Data Science world. Its accessibility and easy setup makes it one of the best languages for people that want to start to learn how to code, particularly in Data Analysis and Science setting.
When starting, most people immediately jump into learning Data Frames as this is a commonly used object throughout data pipelines. But, the truth is that R contains much more than that. It contains other objects that have their own set of characteristics that are super important to make your R scripts shine. Learning these objects and their characteristics should give you excellent tools to make your R scripts flexible and efficient.
Along the way, you will learn how to debug those weird R errors that may happen when you manipulate certain R base functions that return one of these objects.
Let’s know them!
R Vectors
If you already worked with R before, you were probably expecting this, right? :-)
The simplest R object is the vector. This uni-dimensional and single type object is the base for a lot of other objects in the programming language. But don’t be fooled by its simplicity, this is a super powerful object capable of performing many tasks on your scripts. From feeding data to data frames or helping with indexing, this object should be pretty straightforward to learn and apply.
There are multiple ways to create vectors but the most famous ones are with the command c() or with the command :, such as:
# Vector with elements 1,2,3,4
c(1, 2, 3, 4)# Vector with 10 elements, from 1 to 10
1:10
They have a few properties that set them apart from other R objects:
They only support 1 type of element. If you have a character in your vector such as “A”, all the other elements will be turned into character type.
They are uni-dimensional, meaning that they can’t represent data as a table, for instance.
Where can you learn more about vectors? Use the following resources:
Arrays
One of the biggest shortcomings of the vector is the fact that it is uni-dimensional. This is, of course, a big pain when you want to do some type of mathematical calculations that involve more than one dimension — something fairly common in math/statistics or machine learning. Luckily, we have the array!
Arrays are single type objects that are able to expand to multi-dimensions. The good news is that arrays can expand into a theoretically infinite number of dimensions, so you are no longer bounded to a single dimension.
Also, when you work with Arrays you start to understand how you can manipulate multiple dimensions in R and experiment with multiple indexes, something that is extremely important to dominate in data analysis and wrangling. There’s a low chance that you will work with data that has more than 2 dimensions (probably, you will only work with that type of data if you are some type of machine learning engineer and/or have any use case that has to work with tensors), but, it will definitely not hurt to learn how to work with other objects that are not 2D. Characteristics of arrays:
They are multidimensional;
They can only handle one type of element at the time;
You can create an array with the array() function in R:
# Creating an array with 10 elements, with 2 rows, 5 columns and 2 different tables (3 Dimensions)array(
1:10,
dim = c(2,5,2)
)
Where can you learn more about R arrays? Look into these resources:
Matrices
Matrices are special cases of arrays that only have 2 dimensions and their own constructor. The good part is that after you learn arrays in R you are ready to manipulate matrices!
Matrices are really similar to Data Frames as they have rows and columns — the only shortcoming is that they are only able to deal with a single type of data (just like arrays and vectors). Their characteristics are similar to Arrays (regarding data types) and Data Frames (regarding number of dimensions):
They only have two dimensions;
They can only handle one type of element at the time;
You can access methods that you can access with arrays but matrices have a prettier constructor:
# Creating a matrix with two rows and 5 columnsmatrix(
data = 1:10,
nrow = 2,
ncol = 5
)
Some resources to learn about matrices:
Lists
In the previous objects, you might have noticed something — none of them are able to handle data with multiple types (for example, mixing characters and numeric values).
If R only enabled single type elements, it would be extremely cumbersome to perform some common operations in Data Science where we normally have multiple types — think of most data frames that you have analyzed in the past and as they normally have a mixture of multiple data types.
Luckily, we have two main R objects that are ideal to work with multi-type elements. The first one is Lists — they are a super flexible object that enable us to store not only multiple types, but also other R objects in them.
For instance, inside a list we can store a string, a number and an array! This is extremely interesting because now we have an object that is the ultimate flexibility tool — enabling us to store multiple dimensional objects inside it.
To create a list, you can use the list() command:
# Creating an example list with a character, a number and an array
example_list <- list(
my_char = ‘a’,
my_number = 1,
my_array = array(1:4, dim=c(2,2))
)
Some resources to learn about lists:
Data Frames
And now, one of the most famous R objects, the Data Frame!
Data Frames are the holy grail for Data Analysis are used throughout a significant amount of data science projects. In R they are really flexible (on cool thing is that underlying they are really a list!) and show data similar to other 2-dimensional objects.
With data frames you can work with data similarly to how you work with a data table in SQL or a simple table in an Excel file. They are oriented to rows and columns and can also have index names.
Most R tutorials contain examples with data frames. They are the most flexible and handy object to use when it comes to Data Analysis in the programming language. Oh, and if you learn them, you will have a head start when you start to code in Python using the Pandas library — one of the most famous libraries to manipulate data frames in another language!
You can create one with the data.frame() command:
# Creating a data frame with two columns and two rowsexample_df <- data.frame(
name = c(‘John’,’Mary’),
age = c(19,20)
)
You can learn more about data frames in the following resources:
And that’s it! These are 5 objects that are extremely important to grasp when you want to master R. After getting to know these ones, you should be able to understand more advanced data structures, even the ones that are created in external libraries, such as the tibble.
Do you think there is another object that should be included in this list? Write down in the comments below!
I’ve set up a course on learning R from Scratch on Udemy — the course is structured for beginners, contains more than 100 exercises and I would love to have you around!