Overview:
1. Vector – a simple data structure containing same data types.
2. Matrix – a two dimensional data structure containing same data types.
3. Data frame – a two dimensional data structure containing multiple data types.
4. Array – a multi dimensional data structure that can store only same data type.
5. List – an object that can hold different types of data and data structure.

Vector:
Vector is a simple data structure that can hold similar data types (integer, numeric, logical, character). In series 2, I explained how to create variables, and the examples always stored one value into a variable. For example, x <- 3, the variable x is a vector that contains a numerical data type with a value of 3. Users can also store more than one value in a vector by concatenating. c() is the function that represents concatenate in R. 

The R code my_number <- c(3,4,5,6) stores values of 3, 4, 5, 6 to the vector my_number.

# Creating a vector with values from 1 to 10:

my_vector <- 1:10 # the colon helps to create sequence of number from 1 to 10 

# Printing my_vector
my_vector

OUTPUT:
[1]  1  2  3  4  5  6  7  8  9 10

# Creating a vector with seq() function
my_seq <- seq(from = 1, to = 10, by = 1) # seq() function creates a sequence with three arguments from, to, and by.

my_seq

OUTPUT:
[1]  1  2  3  4  5  6  7  8  9 10

When you mix data types within a vector, R will try to find a data type that applies to all the values and changes into it. Look at this vector, test_vector <- c(3, "data", 5). The test_vector has both numerical and character data. Hence, R will change the numerical values to character.

Matrix:
A matrix is a two-dimensional data structure that contains rows and columns. A user can create a matrix using the matrix function. The matrix can only store a single data type similar to a vector.

# Creating a matrix:

my_matrix <- matrix(1:9, nrow=3, ncol=3, byrow=FALSE)

# nrow - refers to the number of rows
# ncol - refers to the number of columns
# byrow - logical argument to inform whether to fill the values row wise or not. 
# check the output to understand byrow argument. Change it as TRUE and test it.

# Printing my_matrix
my_matrix 

OUTPUT:
     [,1] [,2] [,3]

[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

Data frame:
A data frame is an advanced matrix that can store more than one data type. Data analysts mostly work with data frames. data.frame function helps in creating a data frame.

# Creating a data frame:

my_df <- data.frame(x = 1:5, y=letters[1:5])

# I created a data frame with two columns x and y. 
# x consists of values from 1 to 5. 
# Y consists of letters from a to e.
# You can see that we are using two data types - numeric and character. This is not possible with matrix.

# Printing my_df

my_df

OUTPUT:
  x y
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e

Array:
An array has the capacity to store multi-dimensional data. However, the limitation is similar to a vector and a matrix – it can only hold the same data types. The array() function helps in creating an array.

# Creating a vector with values from 1 to 24
my_vector <- 1:24

# Creating an array
my_array <- array(my_vector, dim =c(2,3,4))

# the dim argument helps in determining the number of dimension you need
# Let me break it down here
# dim =c(2,3,4) - we want to understand what does 2, 3, 4 means. 
# dim - refers to dimension/s
# 2 - refers to the number of rows
# 3 - refers to the number of columns
# 4 - refers to the number of dimensions (here we supplied four dimension)

# Print my_array:
my_array

OUTPUT:

, , 1  
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

, , 2

     [,1] [,2] [,3]
[1,]    7    9   11
[2,]    8   10   12

, , 3

     [,1] [,2] [,3]
[1,]   13   15   17
[2,]   14   16   18

, , 4

     [,1] [,2] [,3]
[1,]   19   21   23
[2,]   20   22   24

# For better understanding, check how the values from 1 to 24 is organized. 
# The value is split in to four dimension with two rows and three column matrices 
# But store one behind each other. 
# For even more clarity look at the image attached below.

For my data analysis purposes, I never worked with arrays.

Source: http://venus.ifca.unican.es/Rintro/dataStruct.html

List:
A list has the ability to store any kind of data types. list() function helps in creating a list.

# Creating a list
my_vector <- 1:24
my_matrix <- matrix(1:9, nrow=3, byrow = T)
my_age <- 56

my_list <- list(my_vector, my_matrix, my_age)

# Printing my list
my_list

OUTPUT:

[[1]]
 
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24


[[2]]
     
     [,1] [,2] [,3]
[1,]    1    2    3

[2,]    4    5    6
[3,]    7    8    9


[[3]]
 [1] 56

In the next series, I will explain how to access and modify elements within each data structure.

HOME

One thought on “R series – 4: Data structures

Leave a Reply