Overview:
1. Vector – a simple data structure containing same data types.
2. Matrix – a two dimensional data structure containing same data types.
3. Data frame – a two dimensional data structure containing multiple data types.
4. Array – a multi dimensional data structure that can store only same data type.
5. List – an object that can hold different types of data and data structure.
Vector:
Vector is a simple data structure that can hold similar data types (integer, numeric, logical, character). In series 2, I explained how to create variables, and the examples always stored one value into a variable. For example, x <- 3
, the variable x is a vector that contains a numerical data type with a value of 3. Users can also store more than one value in a vector by concatenating. c()
is the function that represents concatenate in R.
The R code my_number <- c(3,4,5,6)
stores values of 3, 4, 5, 6 to the vector my_number.
# Creating a vector with values from 1 to 10:
my_vector <- 1:10 # the colon helps to create sequence of number from 1 to 10
# Printing my_vector
my_vector
OUTPUT:
[1] 1 2 3 4 5 6 7 8 9 10
# Creating a vector with seq() function
my_seq <- seq(from = 1, to = 10, by = 1) # seq() function creates a sequence with three arguments from, to, and by.
my_seq
OUTPUT:
[1] 1 2 3 4 5 6 7 8 9 10
When you mix data types within a vector, R will try to find a data type that applies to all the values and changes into it. Look at this vector, test_vector <- c(3, "data", 5)
. The test_vector
has both numerical and character data. Hence, R will change the numerical values to character.
Matrix:
A matrix is a two-dimensional data structure that contains rows and columns. A user can create a matrix using the matrix
function. The matrix can only store a single data type similar to a vector.
# Creating a matrix:
my_matrix <- matrix(1:9, nrow=3, ncol=3, byrow=FALSE)
# nrow - refers to the number of rows
# ncol - refers to the number of columns
# byrow - logical argument to inform whether to fill the values row wise or not.
# check the output to understand byrow argument. Change it as TRUE and test it.
# Printing my_matrix
my_matrix
OUTPUT:
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
Data frame:
A data frame is an advanced matrix that can store more than one data type. Data analysts mostly work with data frames. data.frame
function helps in creating a data frame.
# Creating a data frame:
my_df <- data.frame(x = 1:5, y=letters[1:5])
# I created a data frame with two columns x and y.
# x consists of values from 1 to 5.
# Y consists of letters from a to e.
# You can see that we are using two data types - numeric and character. This is not possible with matrix.
# Printing my_df
my_df
OUTPUT:
x y
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
Array:
An array has the capacity to store multi-dimensional data. However, the limitation is similar to a vector and a matrix – it can only hold the same data types. The array() function helps in creating an array.
# Creating a vector with values from 1 to 24
my_vector <- 1:24
# Creating an array
my_array <- array(my_vector, dim =c(2,3,4))
# the dim argument helps in determining the number of dimension you need
# Let me break it down here
# dim =c(2,3,4) - we want to understand what does 2, 3, 4 means.
# dim - refers to dimension/s
# 2 - refers to the number of rows
# 3 - refers to the number of columns
# 4 - refers to the number of dimensions (here we supplied four dimension)
# Print my_array:
my_array
OUTPUT:
, , 1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
, , 2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
, , 3
[,1] [,2] [,3]
[1,] 13 15 17
[2,] 14 16 18
, , 4
[,1] [,2] [,3]
[1,] 19 21 23
[2,] 20 22 24
# For better understanding, check how the values from 1 to 24 is organized.
# The value is split in to four dimension with two rows and three column matrices
# But store one behind each other.
# For even more clarity look at the image attached below.
For my data analysis purposes, I never worked with arrays.
List:
A list has the ability to store any kind of data types. list()
function helps in creating a list.
# Creating a list
my_vector <- 1:24
my_matrix <- matrix(1:9, nrow=3, byrow = T)
my_age <- 56
my_list <- list(my_vector, my_matrix, my_age)
# Printing my list
my_list
OUTPUT:
[[1]]
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
[[2]]
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[[3]]
[1] 56
In the next series, I will explain how to access and modify elements within each data structure.
One thought on “R series – 4: Data structures”