Using Compression for Artifacts
in Peak-SDK
and Peak-CLI
Why do we need Artifacts and Compression
If you are creating a resource which is dependent on an Image (such as App spec, Block spec or Image) and you don’t want it to be source directly from the source code repository, you will need to pack all the files you might need to build the image and pack them into an zip
archive (all platform APIs only accept zip
compressed files) which we call Artifact
. Both the CLI
and the SDK
have utilities to help you create and manage Artifact
(s).
To prepare an artifact
for upload you will need to specify the path to the directory which contains all the files you need to build the image and also an optional list of files to ignore. The CLI
and SDK
will then create a zip
archive of all the files in the directory and ignore the files which match the patterns in the ignore files.
Usage Examples
Consider the following directory structure:
root_dir
├── .dockerignore
├── .gitignore
├── Dockerfile
├── dir1
│ ├── file.txt
│ └── main.py
└── dir2
├── file.txt
└── main.py
Contents of .gitignore
file
*
!**/*.txt
Contents of .dockerignore
file
*
!Dockerfile
!**/*.py
Basic usage with path
and ignore_files
This example shows the usage of the compression module, which will create a zip file named artifact.zip
and put all files in the current directory into that zip. It will also ignore the files which match the patterns listed in the .gitignore
file.
from peak.compression import get_files_to_include, compress
with compress(".", ignore_files=[".gitignore"]) as zip:
with open("artifact.zip", "wb") as f:
f.write(zip.read())
File structure of the zip file:
root_dir
├── dir1
│ ├── file.txt
└── dir2
└── file.txt
Usage without ignore_files
ignore_files
is an optional argument and if not provided it will search for a .dockerignore
file in the given path
and use that as ignore file.
from peak.compression import get_files_to_include, compress
with compress(".") as zip:
with open("artifact.zip", "wb") as f:
f.write(zip.read())
File structure of the zip file:
root_dir
├── Dockerfile
├── dir1
│ └── main.py
└── dir2
└── main.py
Compress without ignore_files
and a missing .dockerignore
If ignore_files
argument is not given and the .dockerignore
file is also not present in path
then all the files present under path
will be included in the zip file.
from peak.compression import get_files_to_include, compress
with compress(".") as zip:
with open("artifact.zip", "wb") as f:
f.write(zip.read())
File structure of the zip file:
root_dir
├── .gitignore
├── Dockerfile
├── dir1
│ ├── file.txt
│ └── main.py
└── dir2
├── file.txt
└── main.py
Compress without both ignore_files and .dockerignore
If ignore_files
argument is not given and also the .dockerignore
file is not present at the provided path
then it will try to zip all the files.
from peak.compression import get_files_to_include, compress
with compress(".") as zip:
with open("artifact.zip", "wb") as f:
f.write(zip.read())
File structure of the zip file:
root_dir
├── .dockerignore
├── .gitignore
├── Dockerfile
├── dir1
│ ├── file.txt
│ └── main.py
└── dir2
├── file.txt
└── main.py
Usage with multiple ignore files
If multiple ignore files are passed then the order of precedence is lowest to highest. i.e. it will give the lowest priority to the file that’s first on the list and highest priority to the file that’s last on the list. In the example below all patterns in the .gitignore
will have higher precedency than the ones in the .dockerignore
file.
from peak.compression import get_files_to_include, compress
with compress(".", ignore_files=[".gitignore", ".dockerignore"]) as zip:
with open("artifact.zip", "wb") as f:
f.write(zip.read())
File structure of the zip file:
root_dir
├── Dockerfile
├── dir1
│ └── main.py
└── dir2
└── main.py
The recommended way
Here are some of the recommendations to keep in mind when working with commands that require artifact:
Although
ignore_files
is an optional argument, it is optional mostly for brevity, in order to better support the default use case of reading a.dockerignore
file at the project root, which it assumes is always present.The other assumption here is that
.dockerignore
is always a strict superset of.gitignore
.
If your use case does not match the above assumptions, then you should always use the
ignore_files
argument.This ensures that only the files that you need for building the image are present in the artifact.
Since there are size limits in the size of artifact and also on the number of files inside the artifact, you would not want to include any extra files in the artifact.
Always check the patterns in your ignore files, so as to be explicitly aware of what files get ignored.
Directories such as
.git
,.venv
,node_modules
,__pycache__
, should always be ignored, not only because they can get large, but also because they are not safe to read.
Use the
print_zip_content
function from theSDK
or--dry-run
from theCLI
when in doubt.