Using Compression for `Artifacts` in `Peak-SDK` and `Peak-CLI`

Why do we need Artifacts and Compression

If you are creating a resource which is dependent on an Image (such as App spec, Block spec or Image) and you don’t want it to be source directly from the source code repository, you will need to pack all the files you might need to build the image and pack them into an zip archive (all platform APIs only accept zip compressed files) which we call Artifact. Both the CLI and the SDK have utilities to help you create and manage Artifact(s).

To prepare an artifact for upload you will need to specify the path to the directory which contains all the files you need to build the image and also an optional list of files to ignore. The CLI and SDK will then create a zip archive of all the files in the directory and ignore the files which match the patterns in the ignore files.

Usage Examples

Consider the following directory structure:

root_dir
├── .dockerignore
├── .gitignore
├── Dockerfile
├── dir1
│   ├── file.txt
│   └── main.py
└── dir2
    ├── file.txt
    └── main.py

Contents of .gitignore file

*
!**/*.txt

Contents of .dockerignore file

*
!Dockerfile
!**/*.py

Basic usage with `path` and `ignore_files`

This example shows the usage of the compression module, which will create a zip file named artifact.zip and put all files in the current directory into that zip. It will also ignore the files which match the patterns listed in the .gitignore file.

from peak.compression import get_files_to_include, compress

with compress(".", ignore_files=[".gitignore"]) as zip:
    with open("artifact.zip", "wb") as f:
        f.write(zip.read())

File structure of the zip file:

root_dir
├── dir1
│   ├── file.txt
└── dir2
    └── file.txt

Usage without `ignore_files`

ignore_files is an optional argument and if not provided it will search for a .dockerignore file in the given path and use that as ignore file.

from peak.compression import get_files_to_include, compress

with compress(".") as zip:
    with open("artifact.zip", "wb") as f:
        f.write(zip.read())

File structure of the zip file:

root_dir
├── Dockerfile
├── dir1
│   └── main.py
└── dir2
    └── main.py

Compress without `ignore_files` and a missing `.dockerignore`

If ignore_files argument is not given and the .dockerignore file is also not present in path then all the files present under path will be included in the zip file.

from peak.compression import get_files_to_include, compress

with compress(".") as zip:
    with open("artifact.zip", "wb") as f:
        f.write(zip.read())

File structure of the zip file:

root_dir
├── .gitignore
├── Dockerfile
├── dir1
│   ├── file.txt
│   └── main.py
└── dir2
    ├── file.txt
    └── main.py

Compress without both ignore_files and .dockerignore

If ignore_files argument is not given and also the .dockerignore file is not present at the provided path then it will try to zip all the files.

from peak.compression import get_files_to_include, compress

with compress(".") as zip:
    with open("artifact.zip", "wb") as f:
        f.write(zip.read())

File structure of the zip file:

root_dir
├── .dockerignore
├── .gitignore
├── Dockerfile
├── dir1
│   ├── file.txt
│   └── main.py
└── dir2
    ├── file.txt
    └── main.py

Usage with multiple ignore files

If multiple ignore files are passed then the order of precedence is lowest to highest. i.e. it will give the lowest priority to the file that’s first on the list and highest priority to the file that’s last on the list. In the example below all patterns in the .gitignore will have higher precedency than the ones in the .dockerignore file.

from peak.compression import get_files_to_include, compress

with compress(".", ignore_files=[".gitignore", ".dockerignore"]) as zip:
    with open("artifact.zip", "wb") as f:
        f.write(zip.read())

File structure of the zip file:

root_dir
├── Dockerfile
├── dir1
│   └── main.py
└── dir2
    └── main.py

The recommended way

Here are some of the recommendations to keep in mind when working with commands that require artifact:

Although ignore_files is an optional argument, it is optional mostly for brevity, in order to better support the default use case of reading a .dockerignore file at the project root, which it assumes is always present.
- The other assumption here is that .dockerignore is always a strict superset of .gitignore.
If your use case does not match the above assumptions, then you should always use the ignore_files argument.
- This ensures that only the files that you need for building the image are present in the artifact.
- Since there are size limits in the size of artifact and also on the number of files inside the artifact, you would not want to include any extra files in the artifact.
Always check the patterns in your ignore files, so as to be explicitly aware of what files get ignored.
- Directories such as .git , .venv , node_modules , __pycache__ , should always be ignored, not only because they can get large, but also because they are not safe to read.
Use the print_zip_content function from the SDK or --dry-run from the CLI when in doubt.

Other Versions v: v0.0.6

Tags: latest; v0.0.5; v0.0.6; v1.0.0; v1.1.0; v1.10.0; v1.11.0; v1.12.0; v1.13.0; v1.14.0; v1.15.0; v1.15.1; v1.16.0; v1.2.0; v1.2.1; v1.3.0; v1.4.0; v1.5.0; v1.6.0; v1.7.0; v1.8.0; v1.9.0