---
title: "orderly"
author: "Rich FitzJohn"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{orderly}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

``` {r echo = FALSE, results = "hide"}
lang_output <- function(x, lang) {
  writeLines(c(sprintf("```%s", lang), x, "```"))
}
r_output <- function(x) lang_output(x, "r")
yaml_output <- function(x) lang_output(x, "yaml")
plain_output <- function(x) lang_output(x, "plain")
orderly_file <- function(...) {
  system.file(..., package = "orderly", mustWork = TRUE)
}

path <- orderly:::prepare_orderly_example("nodata")
path_example <- file.path(path, "src", "example")
## Even simpler
unlink(file.path(path, "archive"), recursive = TRUE)
unlink(file.path(path, "draft"), recursive = TRUE)
unlink(file.path(path, "data"), recursive = TRUE)
unlink(file.path(path, "README.md"))
unlink(file.path(path, "src", "README.md"))

dir_tree <- function(path) {
  withr::with_dir(path, fs::dir_tree("."))
}

owd <- getwd()
knitr::knit_hooks$set(with_path = function(before, options, envir) {
  if (before) {
    setwd(options$with_path_value)
  } else {
    setwd(owd)
  }
  invisible()
})
```

## Introduction

<!-- introduction begin -->

`orderly` is a package designed to help make analysis more reproducible.  Its principal aim is to automate a series of basic steps in the process of writing analyses, making it easy to:

* track all inputs into an analysis (packages, code, and data resources)
* store multiple versions of an analysis where it is repeated
* track outputs of an analysis
* create analyses that depend on the outputs of previous analyses

With `orderly` we have two main hopes:

* analysts can write code that will straightforwardly run on someone else's machine (or a remote machine)
* when an analysis that is run several times starts behaving differently it will be easy to see when the outputs started changing, and what inputs started changing at the same time

`orderly` requires a few conventions around organisation of a project, and after that tries to keep out of your way.  However, these requirements are designed to make collaborative development with git easier by minimising conflicts and making backup easier by using an append-only storage system.

### The problem

One often-touted goal of R over point-and-click analyses packages is that if an analysis is scripted it is more reproducible.  However, essentially all analyses depend on external resources - packages, data, code, and R itself; any change in these external resources might change the results.  Preventing such changes in external resources is not always possible, but *tracking* changes should be straightforward - all we need to know is what is being used.

For example, while reproducible research [has become synonymous with literate programming](https://cran.r-project.org/view=ReproducibleResearch) this approach often increases the number of external resources.  A typical [`knitr`](https://cran.r-project.org/package=knitr) document will depend on:

* the source file (`.Rmd` or `.Rnw`)
* templates used for styling
* data that is read in for the analysis
* code that is directly read in with `source`

The `orderly` package helps by

* collecting external resources before an analysis
* ensuring that all required external resources are identified
* removing any manual work in tracking information about these external resources
* allowing running reports multiple times and making it easy to see what changed and why

The core problem is that analyses have no general _interface_.  Consider in contrast the role that functions take in programming.  All functions have a set of arguments (inputs) and a return value (outputs).  With `orderly`, we borrow this idea, and each piece of analysis will require that the user describes what is needed and what will be produced.

### The process

The user describes the inputs of their analysis, including:

* SQL queries (if using databases)
* Required R sources
* External resource files (e.g., csv data files, Rmd files, templates)
* Packages required to run the analysis
* Dependencies on previously run analyses

The user also provides a list of "artefacts" (file-based results) that they will produce.

Then `orderly`:

1. creates a new empty directory
2. copies over _only_ the declared file resources
3. loads only the declared packages
4. loads the declared R sources
5. evaluates any sql queries to create R objects
6. then runs the analysis
7. verifies that the declared artefacts are produced

It then stores metadata alongside the analysis including hashes of all inputs and outputs, copies of data extracted from the database, a record of all R packages loaded at the end of the session, and (if using git) information about the git state (hash, branch and status).

Then if one of the dependencies of a report changes (the used data, code, etc), we have metadata that can be queried to identify the likely source of the change.

<!-- introduction end -->

## Example

To illustrate, we will start with a minimal example (you can use `orderly::orderly_init` to create a similar structure directly), and we will build it up to demonstrate `orderly` features. In the most minimal example, we want to run a script that creates a graph.  It uses no external resources.

``` {r comment = NA, echo = FALSE}
dir_tree(path)
```

In this example, the `orderly_config.yml` file is completely empty, but serves to mark the root of the `orderly` project.  We have one report, called `example`, and its configuration is within `orderly.yml`:

``` {r results = "asis", echo = FALSE}
yaml_output(readLines(file.path(path_example, "orderly.yml")))
```

There are two keys here:

* `script` the path of the script to run, `script.R`
* `artefacts` a description of the artefacts (files) that will be produced by running this script.  In this case it is a graph with the filename `mygraph.png`

The script is plain R code:

``` {r results = "asis", echo = FALSE}
r_output(readLines(file.path(path_example, "script.R")))
```

The R code can be as long or as short as needed and can use whatever packages it needs.  `orderly` does not do anything with the script apart from run it so
it can be formatted freely (there are no magic comments, etc).  There are no restrictions on what can be done except that it must produce the artefacts listed in `orderly.yml`.  If not, an error will be thrown describing what was missing.

### Running the report

To run the report, use `orderly::orderly_run` (typically one would be in the `orderly` root and so the `root` directory could be omitted, but within this vignette we use a temporary directory):

```{r}
id <- orderly::orderly_run("example", root = path)
```

The return value is the id of the report (also printed on the
third line of log output) and is always in the format
`YYYYMMDD-HHMMSS-abcdef01` where the last 8 characters are hex
digits (i.e., 4 random bytes).  This means reports will
automatically sort nicely but we'll have some collision resistance.

```{r }
id
```

Having run the report, the directory layout now looks like:

``` {r comment = NA, echo = FALSE}
dir_tree(path)
```

Within `drafts`, the directory ``r file.path("example", id)`` has been
created which contains the result of running the report.  In here
there are the files:

* `orderly.yml`: this is an exact copy of the input file
* `script.R`: this is an exact copy of the script used for the analysis
* `mygraph.png`: the artefact created by the report
* `orderly_run.rds`: this is metadata about the run and includes
  hashes of input files, of the data used, and of the output etc,
  along with details about the packages used and the state of git.
  It is stored in R's internal data format.

Every time a report is run it will create a new directory at this
level with a new id.  Running the report again now might create the
directory ``r file.path("example", orderly:::new_report_id())``

We store the copies of files as run by `orderly` so that even if the
input files change we can still easily get back to previous
versions of the inputs, alongside the outputs, and these are safe
from any changes to the underlying source.

You can see the list of draft reports like so:

``` {r }
orderly::orderly_list_drafts(root = path)
```

Once you're happy with a report, then "commit" it with

``` {r collapse = TRUE}
orderly::orderly_commit(id, root = path)
```

After this step our directory structure looks like:

``` {r comment = NA, echo = FALSE}
dir_tree(path)
```

This looks very like the previous, but files have been moved from being within `draft` to being within `archive`.  The other difference is that the index `orderly.sqlite` has been created.  This is a machine-readable index to all the `orderly` metadata that can be used to build applications around `orderly` (for example [OrderlyWeb](https://github.com/vimc/orderly-web), a web portal for `orderly` - see the "remotes" vignette).  The documentation for the database format is available on the [`orderly` website](https://www.vaccineimpact.org/orderly/schema/).

## Creating a report

First, run `orderly::orderly_new` to create a directory within `src`.  The name is important and should not contain spaces (nor should it change as this will change the key report id and you'll lose a chain of history), then edit the file `orderly.yml` within that directory.

``` {r }
orderly::orderly_new("new", root = path)
```

which results in a directory structure like:

```{r comment = NA, echo = FALSE}
dir_tree(path)
```

## Resources, sources and artefacts

Resources to a report are expected to be read-only files that are used
by the script to produce the report. Examples of the sort of files that
should be used as resources are:

 * Moderately sized data files (large datasets should be accessed from
   a database),
 * A markdown file used to create a report,
 * Documentation for the report.

"Resources" cannot be modified by the report; if `orderly` detects that a
resource has been changed an error will be thrown.

`orderly` will automatically detect any files named `README.md` in a
report's source directory and copy them to the new directory too.

```yaml
resources: 
  - years.csv
  - data_dictionary.xlsx
  - report.Rmd
  - code_documentation.md
```

"Sources" are files containing R code that will be sourced (via the R
function `source()`) before the main script is run. Often this file
contains functions or variables used by the main script. All of the
copying and sourcing will be handled by `orderly` itself so there is no
need to explicitly source the files in the main script.

"Artefacts" are the output of the report. At least one artefact must be
listed and files created during the running of the script must be
included as artefacts (or deleted before the script finishes) or an
error will be returned.

Examples of artefacts fields in `orderly.yml`:

```yaml
artefacts:
  - report:
      filenames: report.html
      description: a simple report
```

```yaml
artefacts:
  - report:
      filenames: report.html
      description: a simple report
  - data:
      description:
        - associated data sets
      filenames:
        - data_one.csv
        - data_two.csv
        - data_three.csv
        - data_four.csv
```

When declaring an artefact we have to specify what format the artefact
is. Currently supported formats are :`data`, `report`, `staticgraph`,
`interactivegraph` and `interactivehtml`. These tags reflect the
intent of use of the file, they have no special meaning within `orderly` itself.

## Using artefacts from other reports

It is often the case that we would like to write a report that depends
on an earlier report, _e.g._ one report produces a large dataset and a
later report produces a high level summary. `orderly` allows a report to
directly copy an artefact file from an existing report without having
to manually copy it into the report source directory. This is handled
in the `depends` block of the report's `orderly.yml`.

*To use a file as a dependency it must be explicitly listed as an
artefact.*

An simple example might look like:

```yaml
depends:
  - big-data-report:
      id: 20190425-163691-b8451bbf
      use:
        data.rds: huge-data-set.rds
```

This will copy the file the `huge-data-set.rds` from the report
`big-data-report` with `id` `20190425-163691-b8451bbf` and rename it
`data.rds`. This file can then be used by the report as if it were in
the source directory.

If we want a report to always use the latest version of a report `big-data-report` we can set the `id` field to `latest`, _e.g._:

```yaml
depends:
  - big-data-report:
      id: latest
      use:
        data.rds: huge-data-set.rds
```

This will find the most recent version of the report `big-data-report`
and copy files from that directory.

To use multiple artefacts from a single report add the files into the
`use` block _e.g._:

```yaml
depends:
  - big-data-report:
      id: latest
      use:
        data.rds: huge-data-set.rds
        pop.csv: population_data.csv
```

To use artefacts from multiple reports we add multiple entries to the
`depends` field _e.g._:

```yaml
depends:
  - big-data-report:
      id: latest
      use:
        data.rds: huge-data-set.rds
        pop.csv: population_data.csv
  - report_two:
      id: latest
      use:
        data_b.rds: filename.rds
```

We can also use the same artefact from _different versions_ of the same
report. This might come up if we want to write a report that compares
the output from different versions of another report. The yaml pattern for this
is:

```yaml
depends:
  - big-data-report:
      id: 20190425-163691-b8451bbf
      use:
        data_latest.rds: huge-data-set.rds
  - big-data-report:
      id: 20181225-172991-34c91ef1
      use:
        data_old: huge-data-set.rds
```

The important feature in this example is the dashes before the report
name. When all the report names are different these dashes can be
omitted, but they are necessary when the report depends on different
versions of the same report. Since including the dashes will never
cause a problem but omitting them might, we advise that they should
always be included.

## Parameterised reports

Sometimes it can be useful to control how a report runs by a parameter.  This could be the name of a country that an analysis applies to (though we hope to develop a better interface for this soon) through to controlling the number of iterations that an analysis runs for.  Parameters are declared in the `orderly.yml` like:

```yaml
parameters:
  a:
    default: 1
  b: ~
```

This would declare that a report takes two parameters `a` (with a default of 1), and `b` (with no default).  Running the report would then look like:

```r
orderly::orderly_run("reportname", list(a = 10, b = 100))
```

These parameters are then present in the environment of the report, so the code can use values `a` and `b`.

The parameters will also be interpolated into any SQL queries before they are run, so if the `orderly.yml` contains:

```yaml
data:
  cars:
    query: SELECT * FROM mtcars WHERE cyl > ?a
```

then this will be evaluated on the SQL server with `a` substituted in where the query says `?a` (this is done with `DBI::sqlInterpolate`).


## Using global resources

There might be files that are used in (almost) every report.  Examples
of these sorts of files might be document templates or organisation
logos. To set up a global resource create a directory
`your_global_dir` in `<root>` and the following to the
`orderly_config.yml`:

```yaml
global_resources:
  your_global_dir
```

Then to use any file in `your_global_dir` in your report add a
`global_resources` field to that report's `orderly.yml`:

```yaml
global_resources:
  logo.jpg: org_logo.jpg
  latex_class.cls: org_latex_class.cls
  styles.css: org_styles.css
```

Currently code _i.e._ R source code cannot be sourced from the global
resources directory. So for example utility functions common across
multiple reports must be included in each report directory separately.
The functionality to include global source code may be added in future
versions.

## Using version control

`orderly` is designed to work well with [git](https://git-scm.com/) (or any other version control system).  The general principle is that `src/` and any configuration files should be added to git, while most of the generated files (`data`, `draft`, `archive`, and any SQLite databases such as `orderly.sqlite`) should be excluded, which can be done by creating a `.gitignore` file.

This process can be automated by running

```{r}
orderly::orderly_use_gitignore(root = path, prompt = FALSE)
```

(the default, `prompt = TRUE` will request user confirmation before writing the `.gitignore` file).

You should arrange to back up the entire orderly directory through some other means.

If your generated files are particularly small you might leave them in git, but in our experience this will result in a git repository that is unpleasant to use.

## Using SQL databases

One of the original aims of `orderly` was to provide a set of tools for use of SQL databases within reproducible reporting.  Because the SQL database is an external global resource it is difficult to work with any concept of "versioning" from R (there is no git history, no way of easily rolling back to previous versions etc).  If using a central SQL server, there is configuration that should be kept *out* of any analysis, particularly things like passwords.  Configuration problems multiply when using both "production" and "staging" systems as we would like to be able to switch between different configurations.

### Configuration

The root `orderly_config.yml` configuration specifies the locations of databases (there can be any number), for example:

```yaml
database:
  source:
    driver: RPostgres::Postgres
    args:
      host: dbhost.example.org
      port: 5432
      user: myusername
      password: s3cret
      dbname: mydb
```

This database will be referred to elsewhere as `source` and it will be connected with the `RPostgres::Postgres` driver (from the [RPostgres](https://cran.r-project.org/package=RPostgres) package).  Arguments within the `args` block will be passed to the driver, in this case being the equivalent of:

```r
DBI::dbConnect(RPostgres::Postgres, host = "dbhost.example.org", port = 5432,
               user = "myusername", password = "s3cret", dbname = "mydb")
```

The values used in the `args` blocks can be environment values (e.g., `password: $DB_PASSWORD`) in which case they will be resolved from the environment before connecting.  This will be useful for keeping secrets out of source control.

For [SQLite](https://cran.r-project.org/package=RSQLite) databases, the `args` block will typically contain only `dbname` which is the path to the database file.

### Use within a report

A report configuration (`orderly.yml`) can contain a `data` block, which contains sql queries, such as:

```yaml
data:
  cars:
    query: SELECT * FROM mtcars WHERE cyl = 4
    database: source
```

In this case, the query `SELECT * FROM mtcars WHERE cyl = 4` will be run against the `source` database to create an object `cars` in the report environment.  The actual report code can use that object without having ever created the database connection or evaluating the query.

Further, the data used in the query will be captured in `orderly`'s `data` directory, and hashes of the data will be stored alongside the results.  This means that even if the data in the database is a constantly moving target we can still detect if changes to the data are responsible for changes in the result of a report.

### Advanced use

If you need to perform complicated SQL queries, then you can export the database connection directly by adding a block:

```yaml
connection:
  con: source
```

which will save the connection to the `source` database as the R object `con`.  We have used this where a report requires running queries in a loop that depend on the results of a previous query or additional data loaded into a report, or where the result of the query will be very large and we do not want to save it to disk.

Note that this reduces the amount of tracking that `orderly` can do, as we have no way of knowing what is done with the connection once passed to the script.

### Customising the database configuration

The contents of `orderly_config.yml` may contain things like secrets
(passwords) or hostnames that vary depending on deployment (e.g.,
testing locally vs running on a remote system).  To customise this,
you can use environment variables within the configuration.  So
rather than writing

```yaml
database:
  source:
    driver: RPostgres::Postgres
    args:
      host: localhost
      port: 5432
      user: myuser
      dbname: databasename
      password: p4ssw0rd
```

you might write

```yaml
database:
  source:
    driver: RPostgres::Postgres
    args:
      host: $MY_DBHOST
      port: $MY_DBPORT
      user: $MY_DBUSER
      dbname: $MY_DBNAME
      password: $MY_PASSWORD
```

environment variables, as used this way **must** begin with a
dollar sign and consist only of uppercase letters, numbers and the
underscore character.  You can then set the environment variables
in an `.Renviron` (either within the project or in your home
directory) file or your `.profile` file.  Alternatively, you can
create a file `orderly_envir.yml` in the same directory as
`orderly_config.yml` with key-value pairs, such as

```yaml
MY_DBHOST: localhost
MY_DBPORT: 5432
MY_DBUSER: myuser
MY_DBNAME: databasename
MY_PASSWORD: p4ssw0rd
```

This will be read every time that `orderly_config.yml` is read (in
contrast with `.Renviron` which is read-only at the start of a
session).  This will likely be more pleasant to work with.

The advantage of using environment variables is that you can add the
`orderly_envir.yml` file to your `.gitignore` and avoid committing
system-dependent data to the central repository (see
`orderly::orderly_use_gitignore`) to help automate this.

To avoid leaving passwords in plain text, you can use [`vault`](https://www.vaultproject.io) (along with the R client [`vaultr`](https://cran.r-project.org/package=vaultr)) to
retrieve them.

To do this, you should include the address of your vault server in the `orderly_config.yml` as

```yaml
vault:
  addr: https://example.com:8200
```

Then, for values that you want to retrieve from the vault, set the value of the field to `VAULT:<path>:<field>`, where `<path>` is the name of a vault
secret path (probably beginning with `/secret/` and `field` is the name of the field at that path.  So, for example:

```yaml
      password: VAULT:/secret/users/database_user:password
```

would look up the field `password` at the path
`/secret/users/database_user`.  This can be stored in
`orderly_config.yml`, in the contents of an environment variable or
in `orderly_envir.yml` (currently this only uses the vault [version 1 key-value storage](https://www.vaultproject.io/docs/secrets/kv/kv-v1.html))

If you need to control how the vault server is accessed, then you can pass additional arguments within an `args` block:

```yaml
vault:
  addr: https://example.com:8200
  login: userpass
  username: alice
  password: secret
  mount: userpass
```

This is equivalent to connecting to the vault, using `vaultr` as

```r
vaultr::vault_client(addr = "https://example.com:8200", login = "userpass",
                     username = "alice", password = "secret",
                     mount = "userpass")
```

Environment variables here will be respected so you could write:

```yaml
vault:
  addr: https://example.com:8200
  login: userpass
  username: $VAULT_USER
  password: $VAULT_PASSWORD
```

and the username and password will be found from environment variables (the actual secret resolution uses `vaultr::vault_resolve_secrets` - see the documentation in `vaultr` for further details).

### Advanced database configuration

_In general, you can ignore this section if you only use one global database._

The above approach can be used to switch databases by using different environmental variables, but that can become tiresome.  If you have multiple database "instances" corresponding to different realisations of the same logical database (e.g., production and staging), then you can configure and switch between these directly from `orderly` commands.  At [VIMC](https://www.vaccineimpact.org) we have several copies of our main database: one called `production`, which is the canonical copy, and then several `staging` copies that we use for experimentation.

To configure this situation, list common arguments within the `args` block as before, then add logical databases as named entries in an `instances` field:

```
database:
  source:
    driver: RPostgres::Postgres
    args:
      port: 5432
      user: user
      dbname: mydb
    instances:
      production:
        host: production.example.org
        password: $PASSWORD_PRODUCTION
      staging:
        host: staging.example.org
        password: $PASSWORD_STAGING
    default_instance: $DEFAULT_INSTANCE
```

Here - staging and production have different hostnames (`production.example.org` and `staging.example.org`) and different passwords (retrieved using environment variables) and the default instance is set with another environment variable (`$DEFAULT_INSTANCE`, which must be one of `production` or `staging`).  To switch between databases, you can set that variable, or pass the `instance` argument to `orderly::orderly_run` and friends, as:

```r
orderly::orderly_run(name, instance = "production")
```

or

```r
orderly::orderly_run(name, instance = "staging")
```

If there is more than one database configured, then the interpretation of `instance` is a little more nuanced.  For example, suppose we have this (abridged) database configuration:

```
database:
  source:
    driver: RPostgres::Postgres
    args:
      port: 5432
    instances:
      production:
        host: production.example.org
      staging:
        host: staging.example.org
    default_instance: staging
  extra:
    driver: RPostgres::Postgres
    args:
      port: 5432
      host: extra.example.org
```

Then we can pass in a string like `production` in as the instance, e.g.,

```
orderly::orderly_run(name, instance = "production")
```

as the `source` database will select the `production` instance and as there are no instances configured for the `extra` database we will ignore the argument when connecting to `extra`.

However, if *both* databases had two instances, such as:

```
database:
  source:
    driver: RPostgres::Postgres
    args:
      port: 5432
    instances:
      production:
        host: production.example.org
      staging:
        host: staging.example.org
    default_instance: staging
  extra:
    driver: RPostgres::Postgres
    args:
      port: 15432
    instances:
      production:
        host: production.example.org
      staging:
        host: staging.example.org
    default_instance: staging
```

Then it is possible to select *different* instances for each database, such as:

```
orderly::orderly_run(name,
                      instance = c(source = "production", extra = "staging"))
```

## Using environment variables and secrets

Environment variables declared in `orderly.yml` are made available in the report script (since orderly 1.1.11). For example, if your `orderly.yml` contains

```yaml
environment:
  external_file: EXTERNAL_FILE_PATH
  variable_2: ENV_VAR_2
```

Then orderly will evaluate the variables and make them available in your script via the specified name. For example in your script you can use

```r
external_data <- read.csv(external_file)
```

The expected use case for this is if you have data which you want to use in a report but do not want to be in the orderly repository (either because it is sensitive or particularly large). You can have the external files somewhere else on your machine and specify the path to them via an environment variable so they can then be accessed from the report script.

Environment variables can also be used in top-level `orderly_envir.yml` (since orderly 1.1.6) which can be accessed via `Sys.getenv()`. For example, if your `orderly_envir.yml` contains

```yaml
DATA_URL: https://www.example.com/thedata
```

Then in your script you can use

```r
url <- Sys.getenv("DATA_URL")
download.file(url)
```

If the values are sensitive, then this is not ideal, as you will store your values in plain text in the `orderly_envir.yml`.  Instead, if using vault, you can use a `secrets:` section in `orderly.yml`, like:

```yaml
secrets:
  password: /secret/myproject/login:value
```

and then an R variable `password` will be available to all the code in your report, containing the result of looking up `/secret/myproject/login` in your vault and getting the `value` field.

## Developing a report

Because orderly works in a directory that is not the same as the source directory (e.g., `draft/myreport/YYYYMMDD-HHMMSS-abcdefgh`), because global resources and dependencies etc are copied in just as it is used, and because `orderly` takes control of things like parameters and sourcing files, it may not seem straightforward to develop a report as you would ordinarily.

In order to make this easier, orderly has a set of functions to help develop a report within the source directory.  These functions are:

* `orderly::orderly_develop_start` which copies files *into* your source copy
* `orderly::orderly_develop_status` which reports on the status of your source
* `orderly::orderly_develop_clean` which cleans up files that orderly copied

This section illustrates the idea, creating a new report that will depend on an artefact from a previous report.  Here is the state of our orderly tree from before:

```{r, include = FALSE}
path_new <- file.path(path, "src", "new")
local({
  yml <- c(
    "script: script.R",
    "",
    "artefacts:",
    "  - staticgraph:",
    "      description: Mean of the values",
    "      filenames: mean.txt",
    "",
    "depends:",
    "  example:",
    "    id: latest",
    "    use:",
    "      data.csv: mydata.csv")
  code <- c(
    'data <- read.csv("data.csv")',
    'writeLines(as.character(mean(data$y)), "mean.txt")')

  writeLines(yml, file.path(path_new, "orderly.yml"))
  writeLines(code, file.path(path_new, "script.R"))
})
```

``` {r comment = NA, echo = FALSE}
dir_tree(path)
```

The new report has an `orderly.yml` containing

``` {r results = "asis", echo = FALSE}
yaml_output(readLines(file.path(path_new, "orderly.yml")))
```

In order to run this report, we need the file `data.csv` (which contains the output `mydata.csv` from the latest version of the `example` report) to exist.  If we wanted to interactively develop the `script.R` we'd have to run `orderly::orderly_run` repeatedly, which will be annoying - and impractical if the report is slow to run.  So we can run instead:

```{r}
orderly::orderly_develop_start("new", root = path)
```

which copies in the required artefact and we can then change the directory (using something like `setwd("src/new")`) and start working on the report directly.

You can see the status of the directory by running

```{r}
orderly::orderly_develop_status("new", root = path)
```

(the output of this object is likely to change in future versions), which shows that `data.csv` is present, and that it is _derived_ - by which means that orderly knows that it should not persist in the source tree.  Files marked as `derived` as `TRUE` are liable for deletion by `orderly::orderly_develop_clean`.  The output also shows that `mean.txt` is not present.

If you have changed directly into the path under development (via `setwd(file.path(path, "src/new"))`) the you can omit the arguments and simply call

```{r, with_path = TRUE, with_path_value = path_new}
orderly::orderly_develop_status()
```

You can then run your script as if it was a normal R script:

```{r, with_path = TRUE, with_path_value = path_new}
source("script.R", echo = TRUE)
```

(the above code assuming that we are within `src/new` with the report)

After which the `data.csv` is present

```{r, with_path = TRUE, with_path_value = path_new}
orderly::orderly_develop_status()
```

If a newer version of the upstream dependency has become available, you can update the file by running `orderly::orderly_develop_start()` again

```{r, with_path = TRUE, with_path_value = path_new}
id <- orderly::orderly_run("example", root = path)
orderly::orderly_commit(id, root = path)
orderly::orderly_develop_start()
```

(note that this has updated `data.csv` to use this new id `r id`).

Finally, you can delete the files that orderly has copied into the source tree:

```{r, with_path = TRUE, with_path_value = path_new, collapse = TRUE}
orderly::orderly_develop_clean()
orderly::orderly_develop_status()
```

(this function also accepts `name` and `root` as above, required if not working in the source directory)