S3 access

Since DC datasets can become quite large, it often makes sense to put them somewhere centrally, such as a shared network drive or DCOR. You may also choose to upload your files directly to an S3-compatible object store, which dclab supports as well (It is actually in integral part of the DCOR format).

Public data

Opening public datasets on S3 is straight forward. To get started, you only need to know the URL of the object:

import dclab
s3_url = "https://objectstore.hpccloud.mpcdf.mpg.de/circle-5a7a053d-55fb-4f99-960c-f478d0bd418f/resource/fb7/19f/b2-bd9f-817a-7d70-f4002af916f0"
ds = dclab.new_dataset(s3_url)
print(ds.config)

Private data

Accessing private data requires you to pass the key ID and the access secret like so:

import dclab
s3_url = "..."
ds = dclab.new_dataset(s3_url,
                       secret_id="YOUR-S3-KEY-ID",
                       secret_key="YOUR-S3-ACCESS-SECRET")

Alternatively, you can also set the environment variables DCLAB_S3_ACCESS_KEY_ID and DCLAB_S3_SECRET_ACCESS_KEY, and optionally the DCLAB_S3_ENDPOINT_URL. If you cannot edit environment variables, you can also modify the environment before importing dclab like so:

import os
os.environ["DCLAB_S3_ACCESS_KEY_ID"] = "4f4bf368365967466be9baf07028a5f3"
os.environ["DCLAB_S3_ACCESS_KEY_ID"] = "12cd2fe004bc0f17fe9cd76dae412e0d"
os.environ["DCLAB_S3_ENDPOINT_URL"] = "https://objectstore.hpccloud.mpcdf.mpg.de"

import dclab
dclab.new_dataset(...)