R Package | R Markdown: A Powerful Tool for Generating Data Reports

R Package | R Markdown: A Powerful Tool for Data Reporting#

R Markdown stands on the shoulders of knitr and Pandoc, the former executes embedded code within the document and converts R Markdown to Markdown; the latter presents Markdown in the desired output format (such as PDF, HTML, Word, etc.)

R Markdown

This article is translated from the first three chapters of Xie Yihui's new book “R Markdown: The Definitive Guide”, with some content omitted. It mainly introduces the structure and syntax rules of R Markdown. If you want to learn more detailed content, we recommend reading the original book.

Installation#

This assumes you have already installed R (https://www.r-project.org) and RStudio IDE (https://www.rstudio.com). RStudio is not mandatory, but installing it will make it easier for you to use R Markdown. If you haven't installed RStudio IDE, you will have to install Pandoc (http://pandoc.org), otherwise, you do not need to install Pandoc separately, as it is bundled with RStudio. Next, you can install the rmarkdown package in R:

# Install from CRAN
install.packages('rmarkdown')

# Or if you want to test the development version,
# install from GitHub
if (!requireNamespace("devtools"))
  install.packages('devtools')
devtools::install_github('rstudio/rmarkdown')

If you want to generate PDF document output, you will need to install LaTeX. For those R Markdown users who have never installed LaTeX before, we recommend installing TinyTeX (https://yihui./tinytex/):

install.packages("tinytex")
tinytex::install_tinytex()  # install TinyTeX

TinyTeX is a lightweight, cross-platform, and easy-to-maintain LaTeX distribution. When compiling LaTeX or R Markdown documents into PDF, tinytex can help you automatically install the required R packages and ensure that a LaTeX document is compiled the correct number of times to resolve all cross-reference issues. If you do not understand what these two things mean, you should follow our advice to install TinyTeX, as these details are often not worth your time and effort to worry about.

With the rmarkdown package, RStudio/Pandoc, and LaTeX, you should be able to compile most R Markdown documents. In some cases, you may need additional packages, which we will mention as necessary.

References

R Core Team. 2018. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Xie, Yihui. 2018f. Tinytex: Helper Functions to Install and Maintain Tex Live, and Compile Latex Documents. https://github.com/yihui/tinytex.

Basics#

R Markdown provides a framework for creating documents in data science. R Markdown can accomplish the following two tasks:

Save and execute code;
Generate shareable high-quality reports.

The design intention of R Markdown is to make it easier to achieve reproducibility of report content, as the computation code and narrative are in the same document, and the results are automatically generated from the source code. Additionally, R Markdown supports dozens of static and dynamic/interactive output formats.

If you prefer to learn through videos, we recommend checking the website https://rmarkdown.rstudio.com and watching the videos in "Get Started," which cover the basics of R Markdown.

Document Structure#

Below is a very simple R Markdown document, which is a plain text document with a .Rmd extension.

---
title: "Hello R Markdown"
author: "Awesome Me"
date: "2018-02-14"
output: html_document
---

This is a paragraph in an R Markdown document.

Below is a code chunk:

```{r}
fit = lm(dist ~ speed, data = cars)
b   = coef(fit)
plot(cars)
abline(fit)
```

The slope of the regression is `r b[1]`.

```

```

You can create such a text document using any text editor (including but not limited to RStudio). If using RStudio, you can create a new Rmd document from File -> New File -> R Markdown.

An R Markdown document has three basic components: metadata, text, and code. The metadata is the content between the three dashes ---. The syntax for metadata is YAML (YAML Ain't Markup Language), so it is sometimes referred to as YAML metadata or YAML frontmatter. It is important to note that indentation is crucial in YAML; neglecting it can lead to severe consequences. Please refer to Appendix b.2 in the book "bookdown" (2016) written by Xie Yihui for some simple examples that demonstrate YAML syntax.

The body of the document follows the rules of writing metadata. The syntax for text is Markdown, which will be introduced in Section 2.5. There are two types of computer code, explained in detail in Section 2.6:

Code chunks: Start with three backticks and the language used, where r represents the programming language used, and end with three backticks. You can fill in chunk options in curly braces (e.g., to set the height of a plot to 5 inches: {r, fig.height=5}).
Inline code: Starts with `r` and ends with a single backtick.

Document Compilation#

The simplest way is to click the Knit button in RStudio, with the corresponding shortcut key being Ctrl + Shift + K (on macOS, it is Cmd + Shift + K). Of course, you can also run the code rmarkdown::render directly for rendering compilation, such as:

rmarkdown::render('foo.Rmd', 'pdf_document')

When compiling multiple documents, using functions is more convenient, as you can directly use loops for rendering compilation.

Reference Cards#

RStudio has created a large number of reference cards, which are available for free at https://www.rstudio.com/resources/cheatsheets/.

Output Formats#

There are two output formats: documents and presentations. All available formats are as follows:

beamer_presentation
github_document
html_document
ioslides_presentation
latex_document
md_document
odt_document
pdf_document
powerpoint_presentation
rtf_document
slidy_presentation
word_document

We will document these output formats in detail in Chapters 3 and 4. More output formats are provided in other extension packages (starting from Chapter 5). For the output format names in the YAML metadata of Rmd files, if the format comes from an extension package, you need to include the package name (if the format comes from the rmarkdown package, the rmarkdown:: prefix is not needed), for example:

output: tufte::tufte_html

Each output format typically has multiple format options. All these options are documented on the R package help page. For example, you can type ?rmarkdown:: in R to open the help page about the html_document format. When you want to use certain options, you must convert these values from R to YAML, for example:

html_document(toc = TRUE, toc_depth = 2, dev = 'svg')

Can be written in YAML as:

output:
  html_document:
    toc: true
    toc_depth: 2
    dev: "svg"

Strings in YAML typically do not require quotes (dev: "svg" and dev:svg are the same), unless they contain special characters, such as a colon :. If you are unsure whether to quote a string, you can test it with the yaml package, for example:

cat(yaml::as.yaml(list(
  title = 'A Wonderful Day',
  subtitle = 'hygge: a quality of coziness'
)))

## title: A Wonderful Day
## subtitle: 'hygge: a quality of coziness'

Note that the subtitle in the above example is quoted with single quotes due to the colon.

If an option has sub-options (which means the value of that option is a list in R), the sub-options need to be further indented, for example:

output:
  html_document:
    toc: true
    includes:
      in_header: header.html
      before_body: before.html

Some options will be passed to knitr, such as dev, fig_width, and fig_height. Detailed documentation for these options can be found on the knitr documentation page. Note that the actual knitr option names may differ. In particular, knitr uses . in names, but rmarkdown uses _, for example, in rmarkdown, fig_width corresponds to fig.width in knitr.

Some options will be passed to Pandoc, such as toc, toc_depth, and number_sections. When in doubt, you should refer to the Pandoc documentation. The R Markdown output format functions usually have a pandoc_args parameter, which should be a character vector of arguments passed to Pandoc. If you find any Pandoc features not represented by output format parameters, you can use this ultimate argument, for example:

output:
  pdf_document:
    toc: true
    pandoc_args: ["--wrap=none", "--top-level-division=chapter"]

Markdown Syntax#

Inline Formatting#

Italic : _text_ or *text*
Bold : **text**
Subscript : H~3~PO~4~ renders as $H_3PO_4$
Superscript : Cu^2+^ renders as $Cu^{2+}$
Footnote : ^[This is a footnote.]
Inline code : `code`, can use n+1 backticks to output a code block containing n backticks.
Hyperlink : [text](link)
Image link : ![alt text or image title](path/to/image)

Citation :

@Manual{R-base,
  title = {R: A Language and Environment for Statistical
    Computing},
  author = {{R Core Team}},
  organization = {R Foundation for Statistical Computing},
  address = {Vienna, Austria},
  year = {2017},
  url = {https://www.R-project.org/},
}

Block Elements#

Headers
```
# First-level header

## Second-level header

### Third-level header
```
If you do not want a header to be numbered, you can add {-} or {.unnumbered} after the header, for example:
```
# Preface {-}
```

Unordered List

- one item
- one item
- one item
  - one more item
  - one more item
  - one more item

Ordered List

1. the first item
2. the second item
3. the third item
  - one unordered item
  - one unordered item

Quote

> "I thoroughly disapprove of duels. If a man should challenge me,
  I would take him kindly and forgivingly by the hand and lead him
  to a quiet place and kill him."
>
>                                                 --- Mark Twain

"I thoroughly disapprove of duels. If a man should challenge me,
I would take him kindly and forgivingly by the hand and lead him
to a quiet place and kill him."

--- Mark Twain

Code Block

```
This text is displayed verbatim / preformatted
```

Or indent by four spaces:

    This text is displayed verbatim / preformatted

Mathematical Expressions#

$f(k) = {n \choose k} p^{k} (1-p)^{n-k}$ : $f(k) = {n \choose k} p^{k} (1-p)^{n-k}$

$$f(k) = {n \choose k} p^{k} (1-p)^{n-k}$$ :

f(k) = {n \choose k} p^{k} (1-p)^{n-k}

Code Chunk Options#

Click the Insert button, with the corresponding shortcut key being Ctrl + Alt + I (macOS: Cmd + Option + I).

There are many code chunk options available at https://yihui.name/knitr/options, and here we list some commonly used ones:

eval=TRUE : Execute the current code chunk;

```{r}
# execute code if the date is later than a specified day
do_it = Sys.Date() > '2018-02-14'
```

```{r, eval=do_it}
x = rnorm(100)
```

echo=TRUE : Output source code;
result : When set to 'hide', text output will be hidden; when set to 'asis', text output will be written "as is".
collapse=TRUE : Combine text output and source code into a single code chunk output, making it more compact;
warning, message, error : Whether to display warnings, messages, and errors in the output document;
include=FALSE : Run the current code and do not display any source code or output;
cache : Whether to enable caching. If caching is enabled, the same code chunk will not be evaluated again during the next compilation of the document (if the code chunk has not been modified), which will save you time;
fig.width, fig.height : Size (in inches) of the chunk (graphics device). Note: fig.dim = c(6, 4) means fig.width = 6 and fig.height = 4;
out.width, out.height : Output size of R images in the output document. You can use percentages, for example, out.width = '80%' means 80% of the page width;
fig.align : Alignment of the image;
dev : Format for saving R images (e.g., 'pdf', 'png', 'svg', 'jpeg');
fig.cap : Image caption;
child : You can include child documents in the main document. This option selects a path to an external file.

If an option needs to be frequently set to values across multiple code chunks, you might consider globally setting it in the first code chunk of the document:

```{r, setup, include=FALSE}
knitr::opts_chunk$set(fig.width = 8, collapse = TRUE)
```

Images#

Local images can also be adjusted using code chunk options, for example:

```{r, out.width='25%', fig.align='center', fig.cap='...'}
knitr::include_graphics('images/hex-rmarkdown.png')
```

If you want to fade in a graphic not generated by R code, you can use the knitr::include_graphics() function, which allows you to better control the properties of the image, rather than using Markdown syntax like ![alt text or image title](path/to/image) that is difficult to adjust image properties.

Tables#

You can easily create tables using the knitr::kable() function, and the table caption can be set using caption, for example:

```{r tables-mtcars}
knitr::kable(iris[1:5, ], caption = 'A caption')
```

If you are looking for more advanced table style control, it is recommended to use the kableExtra package, which provides functionality for customizing the appearance of PDF and HTML tables. In Section 12.3, it explains how the bookdown package extends the functionality of rmarkdown to allow easy cross-referencing of figures and tables in the text.

References

Xie, Yihui. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.name/knitr/.

Output Documents#

HTML Documents#

To output an HTML document, you first need to write output: html_document in the YAML metadata:

---
title: Habits
author: John Doe
date: March 22, 2005
output: html_document
---

Table of Contents#

---
title: "Habits"
output:
  html_document:
    toc: true
    toc_depth: 2
    toc_float:
      collapsed: false
      smooth_scroll: false
---

toc: true : Output table of contents;
toc_depth : The minimum level of headings to output;
toc_float: true : The table of contents hovers on the left side of the content and remains visible;
collapsed (defaults to TRUE) : Initially only shows top-level headings, and the table of contents expands step by step as the content scrolls;
smooth_scroll (defaults to TRUE) : Whether clicking on a table of contents title navigates to the specified content.

Section Numbering#

---
title: "Habits"
output:
  html_document:
    toc: true
    number_sections: true
---

Note that if there are no first-level headings in the document, second-level headings will be named 0.1, 0.2, ……

Tabbed Sections#

## Quarterly Results {.tabset .tabset-fade .tabset-pills}

### By Product

(tab content)

### By Region

(tab content)

Tabbed sections

.tabset : Makes all subheadings of the main heading appear in tabs with the .tabset attribute, rather than as separate sections;
.tabset-fade : Fade in and out when switching tabs;
.tabset-pills : Changes the appearance of the tabs to make them look like "pills".

Appearance and Style#

---
title: "Habits"
output:
  html_document:
    theme: united
    highlight: tango
---

theme : The theme is drawn from the Bootswatch theme library, applicable themes include: default, cerulean, journal, flatly, readable, spacelab, united, cosmo, lumen, paper, sandstone, simplex, and yeti.
highlight : Code highlighting style. Supported styles include: default, tango, pygments, kate, monochrome, espresso, zenburn, haddock, and textmate.

Image Options#

---
title: "Habits"
output:
  html_document:
    fig_width: 7
    fig_height: 6
    fig_caption: true
---

fig_width and fig_height : Width and height of images;
fig_caption : Controls whether to include captions for images;
dev : Image rendering format, defaults to png.

Table Printing#

The default table output format is:

Option	Description
default	Call the `print.data.frame` generic method
kable	Use the `knitr::kable` function
tibble	Use the `tibble::print.tbl_df` function
paged	Use `rmarkdown::print.paged_df` to create a pageable table

When set to paged format, the output will be:

---
title: "Motor Trend Car Road Tests"
output:
  html_document:
    df_print: paged
---
``` {r, rows.print=5}
mtcars
```

mtcars

TABLE 3.2: The options for paged HTML tables.

Option	Description
max.print	The number of rows to print.
max.print	The number of rows to print.
cols.print	The number of columns to display.
cols.min.print	The minimum number of columns to display.
pages.print	The number of pages to display under page navigation.
pages.print	The number of pages to display under page navigation.
rownames.print	When set to `FALSE` turns off row names.

Code Folding#

---
title: "Habits"
output:
  html_document:
    code_folding: hide
---

code_folding: hide : Initially does not display code, viewers can click to show it;
code_folding: show : Initially displays code, viewers can click to hide it;

Advanced Customization#

Keep Markdown Files#

When running an R Markdown file (*.Rmd), a Markdown file (*.md) will be created and that file will be converted to an HTML file via Pandoc. If you want to keep the Markdown file, you can use the keep_md option:

---
title: "Habits"
output:
  html_document:
    keep_md: true
---

Add Local HTML Documents#

You can achieve more advanced output customization by adding extra HTML content or completely replacing the core Pandoc template. To include content before or after the document body or at the document head, you can use the following options:

---
title: "Habits"
output:
  html_document:
    includes:
      in_header: header.html
      before_body: doc_prefix.html
      after_body: doc_suffix.html
---

Custom Template#

---
title: "Habits"
output:
  html_document:
    template: quarterly_report.html
---

For more details on templates, refer to the documentation on Pandoc Templates. You can also explore the default HTML template default.html5.

The control methods for other types of document formats are similar; for more details, please refer to the original work.