What is R?
Introduction to R
R is
a language and environment for statistical computing and graphics. It is a GNU project which is similar
to the S language and environment which was developed at Bell Laboratories
(formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R
can be considered as a different implementation of S. There are some important
differences, but much code written for S runs unaltered under R.
R
provides a wide variety of statistical (linear and nonlinear modelling,
classical statistical tests, time-series analysis, classification, clustering,
…) and graphical techniques, and is highly extensible. The S language is often
the vehicle of choice for research in statistical methodology, and R provides
an Open Source route to participation in that activity.
One
of R’s strengths is the ease with which well-designed publication-quality plots
can be produced, including mathematical symbols and formulae where needed.
Great care has been taken over the defaults for the minor design choices in
graphics, but the user retains full control.
R is
available as Free Software under the terms of the Free Software
Foundation’s GNU General Public License in source code
form. It compiles and runs on a wide variety of UNIX platforms and similar
systems (including FreeBSD and Linux), Windows and MacOS.
The R environment
R is
an integrated suite of software facilities for data manipulation, calculation
and graphical display. It includes
- an effective
data handling and storage facility,
- a suite of
operators for calculations on arrays, in particular matrices,
- a large,
coherent, integrated collection of intermediate tools for data analysis,
- graphical
facilities for data analysis and display either on-screen or on hardcopy,
and
- a
well-developed, simple and effective programming language which includes
conditionals, loops, user-defined recursive functions and input and output
facilities.
The
term “environment” is intended to characterize it as a fully planned and
coherent system, rather than an incremental accretion of very specific and
inflexible tools, as is frequently the case with other data analysis software.
R,
like S, is designed around a true computer language, and it allows users to add
additional functionality by defining new functions. Much of the system is
itself written in the R dialect of S, which makes it easy for users to follow
the algorithmic choices made. For computationally-intensive tasks, C, C++ and
Fortran code can be linked and called at run time. Advanced users can write C
code to manipulate R objects directly.
Many
users think of R as a statistics system. We prefer to think of it of an
environment within which statistical techniques are implemented. R can be
extended (easily) via packages. There are about
eight packages supplied with the R distribution and many more are available
through the CRAN family of Internet sites covering a very wide range of modern
statistics.
R
has its own LaTeX-like documentation format, which is used to supply
comprehensive documentation, both on-line in a number of formats and in
hardcopy.
Every data analysis technique at your fingertips
R
includes virtually every data manipulation, statistical model, and chart that
the modern data scientist could ever need. You can easily find, download and
use cutting-edge community-reviewed methods in statistics and predictive
modeling from leading researchers in data science, free of charge.
Create beautiful and unique data visualizations
Representing
complex data with charts and graphs is an essential part of the data analysis
process, and R goes far beyond the traditional bar chart and line plot. Heavily
influenced by thought leaders in data visualization like Bill Cleveland and
Edward Tufte, R makes it easy to draw meaning from multidimensional data with
multi-panel charts, 3-D surfaces and more. The custom charting capabilities of
R are featured in many of the stunning infographics seen in the New York Times,
The Economist, and the Flowing Data blog.
Get better results faster
Instead
of using point-and-click menus or inflexible "black-box" procedures,
R is a programming language designed expressly for data analysis. Intermediate
level R programmers create data analyses faster than users of legacy
statistical software, with the flexibility to mix-and-match models for the best
results. And R scripts are easily automated, promoting both reproducible
research and production deployments.
Draw on the talents of data scientists worldwide
As a
thriving open-source project, R is supported by a community of more than 2
million users and thousands of developers worldwide. Whether you're using R to
optimize portfolios, analyze genomic sequences, or to predict component failure
times, experts in every