Skip to article frontmatterSkip to article content

Storage and versioning

  • Phaeton Hub provides an S3 (a.k.a. simple storage service, Garage) API for saving and versioning files. This service allows users to version files, folders and objects.
  • To get access to this service, you an access key and a secret key. You can find them under Storage tab inside the Hub Control Panel (inside Jupyterlab choose File -> Hub Control Panel).

Hub Control Panel Storage API

  • Files are kept in so called buckets and each user gets a bucket which has the same name with their user name.
  • Currently storage and versioning service is only available in Python. You can also get access to this API using S3 client APIs in other languages, however in this case versioning will not be available.

Uploading and downloading files

  • To upload and download files, first create a client API object as below. Files are always uploaded and downloaded as tarballs.
from phaeton_storage import ClientAPI

storage = ClientAPI(
    "garage:3900",                           # Default address of the service, do not change this 
    bucket_name="serdar",                    # Your user name
    access_key="-- very long access key --", # Your access key obtained from Hub Control Panel -> Storage
    secret_key="-- very long secret key --", # Your secret key obtained from Hub Control Panel -> Storage
    region="garage",                         # Region name, do not change this
    secure=False,                            # Service is only available inside Jupyterhub thus SSL is turned off, do not change this
)
  • You can upload the file as below:
storage.upload_file(
    "phaeton-tutorial",   # Select a name for the file you want to record, you do not have to select the same name with file/folder that you are uploading.
    "./phaeton-tutorial"  # Choose the path of the file/folder you want to upload
)
  • This method creates a tarball from the file or folder you want to upload first and then uploads it to the server.
  • You can list the uploaded files with:
storage.list_objects()
  • To download a (version of) file, you need to provide the hash of that file. File is again downloaded as a tarball (it is saved as file_name.tar.gz).
storage._download_file('cGJ8MjAyNS0wOS0wMSAxOToxNjo1NiA3MDc2MDM=',overwrite=True,download_path="./")
  • Files are downloaded as tarballs (compressed archives), they need to be extracted. To extract the contents of the tarball you can use tar -xvf file_name.tar.gz command. You can use it inside a notebook as below.
!tar -xvf file_name.tar.gz

⚠️ Warning: Folders are saved as archives in Garage. When a new version of a folder is recorded, unlike version control systems like git, files without changes are saved again as well. Therefore, it is not convenient to create new versions a folder with big data files. In this case, you can version data and the other folders with code separately.