What are Packages?

Packages are a way to share code. A package contains the code, any documentation, any required data and tests for performance. There are well over 14,000 packages available for download.

This is awesome!

And this is terrifying!

You can do it all in base R….but why?

A package can contain many different functions. Functions are what you call to execute code in r. For example, summary() is a function, but it is loaded into base R. But select() (which I will talk about later) is a function in the dplyr package.

Sometimes you find a function (via Google) but to use it you need to know what package to install. Here is an example.

Where do you get packages?

Most people get their packages from CRAN (Comprehensive R Archive Network) which is the main repository for package files. This is a controlled repository and it for a package to be published here it has to pass several tests.

Packages are shared on Github and most of these are developmental (untested). But can be very useful, like ggbarf().

How to install packages

You can do this through the GUI interface in R-studio, but that is risky.

It is better to install them using the command line: install.packages().

How to use a package

To use a package you have to use a library() statement.

For example I can’t use any of the functions in the dplyr package until I use the following statement:

library(dplyr)

To list all fo the functions that are contained in a package use lsf.str("package: PACKAGENAME")

How to manage packages

There is a package you can use to manage all of your packages. Its called pacman.

pacman::p_load(dplyr, doParallel, nnet, caret, randomForest, ggplot2, ROCR, ipred, pROC, DataExplorer, stringi, lubridate, reshape)

A reccomended package

DataExplorer()

library(arules)

data("AdultUCI")
plot_missing(AdultUCI)

plot_bar(AdultUCI)

plot_histogram(AdultUCI)

plot_correlation(AdultUCI, type="continuous")

Some of my favorite data manipulation functions

select()

From dplyr()

df<-mtcars
subset<-select(df, mpg, disp, hp)
df<-select(df, -carb, -gear)

melt()

From reshape2().

 x = data.frame(
  id   = c(1, 1, 2, 2),
  color = c(1,1,1,1),
  blue = c(1, 0, 1, 0),
  red  = c(0, 1, 0, 1)
)
x
##   id color blue red
## 1  1     1    1   0
## 2  1     1    0   1
## 3  2     1    1   0
## 4  2     1    0   1
x<-melt(x, id.vars=c("id", "color"))
x
##   id color variable value
## 1  1     1     blue     1
## 2  1     1     blue     0
## 3  2     1     blue     1
## 4  2     1     blue     0
## 5  1     1      red     0
## 6  1     1      red     1
## 7  2     1      red     0
## 8  2     1      red     1

recode_factor()

From dplyr().

x$variable<-recode_factor(x$variable, "blue"="B", "red"="R")
x
##   id color variable value
## 1  1     1        B     1
## 2  1     1        B     0
## 3  2     1        B     1
## 4  2     1        B     0
## 5  1     1        R     0
## 6  1     1        R     1
## 7  2     1        R     0
## 8  2     1        R     1

colsplit()

From reshape2

x<-c("jan-20", "feb-13", "mar-14")
colsplit(x, "-", c("month", "day"))
##   month day
## 1   jan  20
## 2   feb  13
## 3   mar  14

Others packages that I use

ggplot2

library(ggplot2)

ggplot(AdultUCI, aes(x=age, y=`hours-per-week`, color=sex))+geom_point()+theme_bw()

ggplot(AdultUCI, aes(x=`hours-per-week`, y= , color=sex, fill=sex))+geom_density()+facet_grid(~sex)+theme_bw()

ggplot(AdultUCI, aes(x=, y=`hours-per-week`, color=sex, fill=sex))+geom_boxplot()+facet_grid(~sex)+theme_bw()

lubridate

ymd("20110604")
## [1] "2011-06-04"
mdy("06-04-2011")
## [1] "2011-06-04"
dmy("04/06/2011")
## [1] "2011-06-04"