User Manual¶
With the addition of a single line of code to the top of a Python script, recipy logs each run of your code to a database, keeping track of the input files, output files and the version of your code. It then lets you query this database to help you to recall the exact steps you took to create a certain output file.
Logging Provenance Information¶
To log provenance information, simply add the following line to the top of your code:
import recipy
Note that this must be the very top line of your script, before you import anything else.
Then just run your script as usual, and all of the data will be logged into the
TinyDB database (don’t worry, the database is automatically created if needed).
You can then use the recipy
command to quickly query the database to find out
what run of your code produced what output file. So, for example, if you run some
code like this:
import recipy
import numpy
arr = numpy.arange(10)
arr = arr + 500
numpy.save('test.npy', arr)
(Note the addition of import recipy
at the beginning of script - but there
are no other changes from a standard script.)
Alternatively, run an unmodified script with python -m recipy SCRIPT [ARGS ...]
to enable recipy logging. This invokes recipy’s module entry point, which takes
care of import recipy for you, before running your script.
Retrieving Information about Runs¶
it will produce an output called test.npy
. To find out the details of the
run which created this file you can search using
recipy search test.npy
and it will display information like the following:
Created by robin on 2015-05-25 19:00:15.631000
Ran /Users/robin/code/recipy/example_script.py using /usr/local/opt/python/bin/python2.7
Git: commit 91a245e5ea82f33ae58380629b6586883cca3ac4, in repo /Users/robin/code/recipy, with origin git@github.com:recipy/recipy.git
Environment: Darwin-14.3.0-x86_64-i386-64bit, python 2.7.9 (default, Feb 10 2015, 03:28:08)
Inputs:
Outputs:
/Users/robin/code/recipy/test.npy
An alternative way to view this is to use the GUI. Just run recipy gui
and
a browser window will open with an interface that you can use to search all of
your recipy ‘runs’:
Logging Files Using Built-In Open¶
If you want to log inputs and outputs of files read or written with built-in
open, you need to do a little more work. Either use recipy.open
(only requires import recipy
at the top of your script), or add
from recipy import open
and just use open
.
This workaround is required, because many libraries use built-in open internally, and you only want to record the files you explicitly opened yourself.
If you use Python 2, you can pass an encoding
parameter to recipy.open
.
In this case codecs
is used to open the file with proper encoding.
Annotating Runs¶
Once you’ve got some runs in your database, you can ‘annotate’ these runs with any notes that you want to keep about them. This can be particularly useful for recording which runs worked well, or particular problems you ran into. This can be done from the ‘details’ page in the GUI, or by running
recipy annotate [run-id]
which will open an editor to allow you to write notes that will be attached to the run. These will then be viewable via the command-line and the GUI when searching for runs.
Saving Custom Values¶
In your script, you can also add custom key-value pairs to the run:
recipy.log_values(key='value')
recipy.log_values({'key': 'value'})
Please note that, at the moment, these values are not displayed in the CLI or in the GUI.
Command Line Interface¶
There are other features in the command-line interface too: recipy --help
to see the other options. You can view diffs, see all runs that created a file
with a given name, search based on ids, show the latest entry and more:
recipy - a frictionless provenance tool for Python
Usage:
recipy search [options] <outputfile>
recipy latest [options]
recipy gui [options]
recipy annotate [<idvalue>]
recipy pm [--format <rst|plain>]
recipy (-h | --help)
recipy --version
Options:
-h --help Show this screen
--version Show version
-p --filepath Search based on filepath rather than hash
-f --fuzzy Use fuzzy searching on filename
-r --regex Use regex searching on filename
-i --id Search based on (a fragment of) the run ID
-a --all Show all results (otherwise just latest result given)
-v --verbose Be verbose
-d --diff Show diff
-j --json Show output as JSON
--no-browser Do not open browser window
--debug Turn on debugging mode
Configuration¶
By default, recipy stores all of its configuration and the database itself in
~/.recipy
. Recipy’s main configuration file is inside this folder, called
recipyrc
. The configuration file format is very simple, and is based on
Windows INI files - and having a configuration file is completely optional:
the defaults will work fine with no configuration file.
An example configuration is:
[ignored metadata]
diff
[general]
debug
This simply instructs recipy not to save git diff
information when it
records metadata about a run, and also to print debug messages (which can be
really useful if you’re trying to work out why certain functions aren’t
patched). At the moment, the only possible options are:
[general]
debug
- print debug messageseditor = vi
- Configure the default text editor that will be used when recipy needs you to type in a message. Use notepad if on Windows, for examplequiet
- don’t print any messagesport
- specify port to use for the GUI
[data]
file_diff_outputs
- store diff between the old output and new output file, if the output file exists before the script is executed
[database]
path = /path/to/file.json
- set the path to the database file
[ignored metadata]
diff
- don’t store the output ofgit diff
in the metadata for a recipy rungit
- don’t store anything relating to git (origin, commit, repo etc) in the metadata for a recipy runinput_hashes
- don’t compute and store SHA-1 hashes of input filesoutput_hashes
- don’t compute and store SHA-1 hashes of output files
[ignored inputs]
- List any module here (eg.
numpy
) to instruct recipy not to record inputs from this module, orall
to ignore inputs from all modules
- List any module here (eg.
[ignored outputs]
- List any module here (eg.
numpy
) to instruct recipy not to record outputs from this module, orall
to ignore outputs from all modules
- List any module here (eg.
By default all metadata is stored (ie. no metadata is ignored) and debug messages
are not shown. A .recipyrc
file in the current directory takes precedence over
the ~/.recipy/recipyrc
file, allowing per-project configurations to be easily
handled.
Note: No default configuration file is provided with recipy, so if you wish to configure anything you will need to create a properly-formatted file yourself.