Cloud Storage¶
DEFAULT CLOUD STORAGE: data are stored on TensorBay cloud
AUTHORIZED CLOUD STORAGE: data are stored on other providers’ cloud
Default Cloud Storage¶
gas.create_dataset("DatasetName")
Authorized Cloud Storage¶
Aliyun OSS
Amazon S3
Azure Blob
Config¶
See cloud storage instruction for details about how to configure cloud storage on TensorBay.
TensorBay SDK supports following methods to configure cloud storage.
For example:
gas.create_oss_storage_config(
"oss_config",
"tests",
endpoint="<YOUR_ENDPOINT>", # like oss-cn-qingdao.aliyuncs.com
accesskey_id="<YOUR_ACCESSKEYID>",
accesskey_secret="<YOUR_ACCESSKEYSECRET>",
bucket_name="<YOUR_BUCKETNAME>",
)
TensorBay SDK supports a method to list a user’s all previous configurations.
gas.list_auth_storage_configs()
Create Authorized Storage Dataset¶
Create a dataset with authorized cloud storage:
dataset_client = gas.create_dataset("dataset_name", config_name="config_name")
Import Cloud Files into Authorized Storage Dataset¶
Take the following cloud directory as an example:
data/
├── images/
│ ├── 00001.png
│ ├── 00002.png
│ └── ...
├── labels/
│ ├── 00001.json
│ ├── 00002.json
│ └── ...
└── ...
Get a cloud client.
from tensorbay import GAS
gas = GAS("Accesskey-*****")
cloud_client = gas.get_cloud_client("config_name")
Import the AuthData from cloud platform and load label file to an authorized storage dataset.
import json
from tensorbay.dataset import Dataset
from tensorbay.label import Classification
# Use AuthData to organize a dataset by the "Dataset" class before importing.
dataset = Dataset("DatasetName")
# TensorBay uses "segment" to separate different parts in a dataset.
segment = dataset.create_segment()
images = cloud_client.list_auth_data("data/images/")
labels = cloud_client.list_auth_data("data/labels/")
for auth_data, label in zip(images, labels):
with label.open() as fp:
auth_data.label.classification = Classification.loads(json.load(fp))
segment.append(auth_data)
dataset_client = gas.upload_dataset(dataset, jobs=8)
Important
Files will be copied from raw directory to the authorized cloud storage dataset path, thus the storage space will be doubled on the cloud platform.