Packages are a way to share code. A package contains the code, any documentation, any required data and tests for performance. There are well over 14,000 packages available for download.
This is awesome!
And this is terrifying!
You can do it all in base R….but why?
A package can contain many different functions. Functions are what you call to execute code in r. For example, summary()
is a function, but it is loaded into base R. But select()
(which I will talk about later) is a function in the dplyr
package.
Sometimes you find a function (via Google) but to use it you need to know what package to install. Here is an example.
Most people get their packages from CRAN (Comprehensive R Archive Network) which is the main repository for package files. This is a controlled repository and it for a package to be published here it has to pass several tests.
Packages are shared on Github and most of these are developmental (untested). But can be very useful, like ggbarf()
.
You can do this through the GUI interface in R-studio, but that is risky.
It is better to install them using the command line: install.packages()
.
To use a package you have to use a library()
statement.
For example I can’t use any of the functions in the dplyr
package until I use the following statement:
library(dplyr)
To list all fo the functions that are contained in a package use lsf.str("package: PACKAGENAME")
There is a package you can use to manage all of your packages. Its called pacman
.
pacman::p_load(dplyr, doParallel, nnet, caret, randomForest, ggplot2, ROCR, ipred, pROC, DataExplorer, stringi, lubridate, reshape)
DataExplorer()
library(arules)
data("AdultUCI")
plot_missing(AdultUCI)
plot_bar(AdultUCI)
plot_histogram(AdultUCI)
plot_correlation(AdultUCI, type="continuous")
select()
From dplyr()
df<-mtcars
subset<-select(df, mpg, disp, hp)
df<-select(df, -carb, -gear)
melt()
From reshape2()
.
x = data.frame(
id = c(1, 1, 2, 2),
color = c(1,1,1,1),
blue = c(1, 0, 1, 0),
red = c(0, 1, 0, 1)
)
x
## id color blue red
## 1 1 1 1 0
## 2 1 1 0 1
## 3 2 1 1 0
## 4 2 1 0 1
x<-melt(x, id.vars=c("id", "color"))
x
## id color variable value
## 1 1 1 blue 1
## 2 1 1 blue 0
## 3 2 1 blue 1
## 4 2 1 blue 0
## 5 1 1 red 0
## 6 1 1 red 1
## 7 2 1 red 0
## 8 2 1 red 1
recode_factor()
From dplyr()
.
x$variable<-recode_factor(x$variable, "blue"="B", "red"="R")
x
## id color variable value
## 1 1 1 B 1
## 2 1 1 B 0
## 3 2 1 B 1
## 4 2 1 B 0
## 5 1 1 R 0
## 6 1 1 R 1
## 7 2 1 R 0
## 8 2 1 R 1
colsplit()
From reshape2
x<-c("jan-20", "feb-13", "mar-14")
colsplit(x, "-", c("month", "day"))
## month day
## 1 jan 20
## 2 feb 13
## 3 mar 14
library(ggplot2)
ggplot(AdultUCI, aes(x=age, y=`hours-per-week`, color=sex))+geom_point()+theme_bw()
ggplot(AdultUCI, aes(x=`hours-per-week`, y= , color=sex, fill=sex))+geom_density()+facet_grid(~sex)+theme_bw()
ggplot(AdultUCI, aes(x=, y=`hours-per-week`, color=sex, fill=sex))+geom_boxplot()+facet_grid(~sex)+theme_bw()
ymd("20110604")
## [1] "2011-06-04"
mdy("06-04-2011")
## [1] "2011-06-04"
dmy("04/06/2011")
## [1] "2011-06-04"