easyfl¶

easyfl.init(conf=None, init_all=True)[source]¶

Initialize EasyFL.

Parameters

conf (dict, optional) – Configurations.
init_all (bool, optional) – Whether initialize dataset, model, server, and client other than configuration.

easyfl.init_dataset()[source]¶: Initialize dataset, either using registered dataset or out-of-the-box datasets set in config.

easyfl.init_model()[source]¶

Initialize model, either using registered model or out-of–the-box model set in config.

Returns: Model used in federated learning.
Return type: nn.Module

easyfl.load_config(file, conf=None)[source]¶

Load and merge configuration from file and input

Parameters

file (str) – filename of the configuration.
conf (dict) – Configurations.

Returns

Internal configurations managed by OmegaConf.

Return type

omegaconf.dictconfig.DictConfig

easyfl.register_client(client)[source]¶

Register federated learning client.

Parameters: client (BaseClient) – Customized federated learning client.

easyfl.register_dataset(train_data, test_data, val_data=None)[source]¶

Register datasets for federated learning training.

Parameters

train_data (FederatedDataset) – Training dataset.
test_data (FederatedDataset) – Testing dataset.
val_data (FederatedDataset) – Validation dataset.

easyfl.register_model(model)[source]¶

Register model for federated learning training.

Parameters: model (nn.Module) – PyTorch model, both class and instance are acceptable.

easyfl.register_server(server)[source]¶

Register federated learning server.

Parameters: server (BaseServer) – Customized federated learning server.

easyfl.run()[source]¶: Run federated learning process.

easyfl.start_client(args=None)[source]¶

Start federated learning client service for remote training.

Parameters: args (argparse.Namespace) – Configurations passed in as arguments.

easyfl.start_remote_client(conf=None, train_data=None, test_data=None, model=None, client=None)[source]¶

Start a remote client.

Parameters

conf (dict) – Configurations. optional, Use the configuration loaded from file if not provided. It overwrites the configurations from file.
train_data (FederatedDataset) – Training dataset.
test_data (FederatedDataset) – Testing dataset.
model (nn.Module) – Model used in client training.
client (BaseClient) – Customized federated learning client class.

easyfl.start_remote_server(conf=None, test_data=None, model=None, server=None)[source]¶

Start a remote server.

Parameters

conf (dict) – Configurations. optional, Use the configuration loaded from file if not provided. It overwrites the configurations from file.
test_data (FederatedDataset) – Test dataset for centralized testing on server.
model (nn.Module) – Model used in client training.
server (BaseServer) – Customized federated learning server class.

easyfl.start_server(args=None)[source]¶

Start federated learning server service for remote training.

Parameters: args (argparse.Namespace) – Configurations passed in as arguments.

easyfl.server¶

class easyfl.server.BaseServer(conf, test_data=None, val_data=None, is_remote=False, local_port=22999)[source]¶

Default implementation of federated learning server.

Parameters

conf (omegaconf.dictconfig.DictConfig) – Configurations of EasyFL.
test_data (FederatedDataset) – Test dataset for centralized testing in server, optional.
val_data (FederatedDataset) – Validation dataset for centralized validation in server, optional.
is_remote (bool) – A flag to indicate whether start remote training.
local_port (int) – The port of remote server service.

Override the class and functions to implement customized server.

Example

>>> from easyfl.server import BaseServer
>>> class CustomizedServer(BaseServer):
>>>     def __init__(self, conf, test_data=None, val_data=None, is_remote=False, local_port=22999):
>>>         super(CustomizedServer, self).__init__(conf, test_data, val_data, is_remote, local_port)
>>>         pass  # more initialization of attributes.
>>>
>>>     def aggregation(self):
>>>         # Implement customized aggregation method, which overwrites the default aggregation method.
>>>         pass

aggregate(models, weights)[source]¶

Aggregate models uploaded from clients via federated averaging.

Parameters

models (list[nn.Module]) – List of models.
weights (list[float]) – List of weights, corresponding to each model. Weights are dataset size of clients by default.

Returns: nn.Module: Aggregated model.

aggregation()[source]¶: Aggregate training updates from clients. Server aggregates trained models from clients via federated averaging.

aggregation_test()[source]¶

Aggregate testing results from clients.

Returns: Test metrics, format in {“test_loss”: value, “test_accuracy”: value}
Return type: dict

compression()[source]¶: Model compression to reduce communication cost.

decompression(model)[source]¶: Decompression the models from clients

distribution_to_test()[source]¶: Distribute to conduct testing on clients.

distribution_to_test_locally()[source]¶: Conduct testing sequentially for selected testing clients.

distribution_to_test_remotely()[source]¶

Distribute testing requests to remote clients through multiple threads. The main thread waits for signal to proceed. The signal can be triggered via notification, as below example.

Example to trigger signal:

>>> with self.condition():
>>>     self.notify_all()

distribution_to_train()[source]¶: Distribute model and configurations to selected clients to train.

distribution_to_train_locally()[source]¶: Conduct training sequentially for selected clients in the group.

distribution_to_train_remotely()[source]¶

Distribute training requests to remote clients through multiple threads. The main thread waits for signal to proceed. The signal can be triggered via notification, as below example.

Example to trigger signal:

>>> with self.condition():
>>>     self.notify_all()

gather_client_train_metrics()[source]¶: Gather client train metrics from other ranks for distributed training, when testing all clients (test_all). When testing all clients, the trained metrics may be override by the test metrics because clients may be placed in different GPUs in training and testing, leading to losses of train metrics. So we gather train metrics and set them in test metrics. TODO: gather is not progressing. Need fix.

get_client_uploads()[source]¶

Get client uploaded contents.

Returns: A dictionary that contains client uploaded contents.
Return type: dict

get_test_clients()[source]¶

Get clients to run testing.

Returns: Clients to test.
Return type: (list[BaseClient]|list[str])

grouping_for_distributed()[source]¶

Divide the selected clients into groups for distributed training. Each group of clients is assigned to conduct training in one GPU. The number of groups = the number of gpus.

Not in distributed training, selected clients are in the same group. In distributed, selected clients are grouped with different strategies: greedy and random.

init_etcd(addresses)[source]¶

Initialize etcd as the registry for client registration.

Parameters: addresses (str) – The etcd addresses split by “,”

init_tracker()[source]¶: Initialize tracking

is_primary_server()[source]¶

Check whether the current process is the primary server. In standalone or remote training, the server is primary. In distributed training, the server on rank0 is primary.

Returns: A flag to indicate whether current process is the primary server.
Return type: bool

is_training()[source]¶

Check whether the server is in training or has stopped training.

Returns: A flag to indicate whether server is in training.
Return type: bool

post_test()[source]¶: Postprocessing after testing.

post_train()[source]¶: Postprocessing after training.

pre_test()[source]¶: Preprocessing before testing.

pre_train()[source]¶: Preprocessing before training.

print_(content)[source]¶

print only the server is primary server.

Parameters: content (str) – The content to log.

profile_training_speed()[source]¶: Manage profiling of client training speeds for distributed training optimization.

save_model()[source]¶: Save the model in the server.

save_tracker()[source]¶: Save metrics in the tracker to database.

selection(clients, clients_per_round)[source]¶

Select a fraction of total clients for training. Two selection strategies are implemented: 1. random selection; 2. select the first K clients.

Parameters

clients (list[BaseClient]|list[str]) – Available clients.
clients_per_round (int) – Number of clients to participate in training each round.

Returns

The selected clients.

Return type

(list[BaseClient]|list[str])

set_client_uploads(key, value)[source]¶

A general function to set uploaded content from clients.

Parameters

key (str) – Dictionary key.
value – Uploaded content.

set_client_uploads_test(accuracies, losses, test_sizes, metrics=None)[source]¶

Set testing results uploaded from clients.

Parameters

accuracies (list[float]) – Testing accuracies of clients.
losses (list[float]) – Testing losses of clients.
test_sizes (list[float]) – Test dataset sizes of clients.
metrics (dict) – Client testing metrics.

set_client_uploads_train(models, weights, metrics=None)[source]¶

Set training updates uploaded from clients.

Parameters

models (dict) – A collection of models.
weights (dict) – A collection of weights.
metrics (dict) – Client training metrics.

set_model(model, load_dict=False)[source]¶

Update the universal model in the server.

Parameters

model (nn.Module) – New model.
load_dict (bool) – A flag to indicate whether load state dict or copy the model.

should_stop()[source]¶

Check whether should stop training. Stops the training under two conditions: 1. Reach max number of training rounds 2. TODO: Accuracy higher than certain amount.

Returns: A flag to indicate whether should stop training.
Return type: bool

start(model, clients)[source]¶

Start federated learning process, including training and testing.

Parameters

model (nn.Module) – The model to train.
clients (list[BaseClient]|list[str]) – Available clients. Clients are actually client grpc addresses when in remote training.

start_remote_training(model, clients)[source]¶

Start federated learning in the remote training mode. Server establishes gPRC connection with clients that are not connected first before training.

Parameters

model (nn.Module) – The model to train.
clients (list[str]) – Client addresses.

start_service()[source]¶: Start federated learning server GRPC service.

stop()[source]¶: Set the flag to indicate training should stop.

test()[source]¶: Testing process of federated learning.

test_in_client()[source]¶

Conduct testing in clients. Currently, it supports testing on the selected clients for training. TODO: Add optionals to select clients for testing.

Returns: Test metrics, {“test_loss”: value, “test_accuracy”: value, “test_time”: value}.
Return type: dict

test_in_server(device='cpu')[source]¶

Conduct testing in the server.

Parameters: device (str) – The hardware device to conduct testing, either cpu or cuda devices.
Returns: Test metrics, {“test_loss”: value, “test_accuracy”: value, “test_time”: value}.
Return type: dict

track(metric_name, value)[source]¶

Track a metric.

Parameters

metric_name (str) – Name of the metric of a round.
value (str|int|float|bool|dict|list) – Value of the metric.

track_communication_cost()[source]¶: Track communication cost among server and clients. Communication cost occurs in training and testing with downlink and uplink costs.

track_test_results(results)[source]¶

Track test results collected from clients.

Parameters: results (dict) – Test metrics, format in {“test_loss”: value, “test_accuracy”: value, “test_time”: value}

train()[source]¶: Training process of federated learning.

update_default_time()[source]¶: Update the estimated default training time of clients using actual training time from profiled clients.

class easyfl.server.ServerService(server)[source]¶

“Remote gRPC server service.

Parameters: server (BaseServer) – Federated learning server instance.

Run(request, context)[source]¶: Trigger federated learning process.

Stop(request, context)[source]¶: Stop federated learning process.

Upload(request, context)[source]¶: Handle upload from clients.

easyfl.server.federated_averaging(models, weights)[source]¶

Compute weighted average of model parameters and persistent buffers. Using state_dict of model, including persistent buffers like BN stats.

Parameters

models (list[nn.Module]) – List of models to average.
weights (list[float]) – List of weights, corresponding to each model. Weights are dataset size of clients by default.

Returns: nn.Module: Weighted averaged model.

easyfl.server.federated_averaging_only_params(models, weights)[source]¶

Compute weighted average of model parameters. Use model parameters only.

Parameters

models (list[nn.Module]) – List of models to average.
weights (list[float]) – List of weights, corresponding to each model. Weights are dataset size of clients by default.

Returns: nn.Module: Weighted averaged model.

easyfl.server.weighted_sum(models, weights)[source]¶

Compute weighted sum of model parameters and persistent buffers. Using state_dict of model, including persistent buffers like BN stats.

Parameters

models (list[nn.Module]) – List of models to average.
weights (list[float]) – List of weights, corresponding to each model. Weights are dataset size of clients by default.

Returns: nn.Module: Weighted averaged model. float: Sum of weights.

easyfl.server.weighted_sum_only_params(models, weights)[source]¶

Compute weighted sum of model parameters. Use model parameters only.

Parameters

models (list[nn.Module]) – List of models to average.
weights (list[float]) – List of weights, corresponding to each model. Weights are dataset size of clients by default.

Returns: nn.Module: Weighted averaged model. float: Sum of weights.

easyfl.client¶

class easyfl.client.BaseClient(cid, conf, train_data, test_data, device, sleep_time=0, is_remote=False, local_port=23000, server_addr='localhost:22999', tracker_addr='localhost:12666')[source]¶

Default implementation of federated learning client.

Parameters

cid (str) – Client id.
conf (omegaconf.dictconfig.DictConfig) – Client configurations.
train_data (FederatedDataset) – Training dataset.
test_data (FederatedDataset) – Test dataset.
device (str) – Hardware device for training, cpu or cuda devices.
sleep_time (float) – Duration of on hold after training to simulate stragglers.
is_remote (bool) – Whether start remote training.
local_port (int) – Port of remote client service.
server_addr (str) – Remote server service grpc address.
tracker_addr (str) – Remote tracking service grpc address.

Override the class and functions to implement customized client.

Example

>>> from easyfl.client import BaseClient
>>> class CustomizedClient(BaseClient):
>>>     def __init__(self, cid, conf, train_data, test_data, device, **kwargs):
>>>         super(CustomizedClient, self).__init__(cid, conf, train_data, test_data, device, **kwargs)
>>>         pass  # more initialization of attributes.
>>>
>>>     def train(self, conf, device=CPU):
>>>         # Implement customized client training method, which overwrites the default training method.
>>>         pass

compression()[source]¶: Compress the client local model after training and before uploading to the server.

connect_to_server()[source]¶: Establish connection between the client and the server.

construct_upload_request()[source]¶

Construct client upload request for training updates and testing results.

Returns: The upload request defined in protobuf to unify local and remote operations.
Return type: UploadRequest

decompression()[source]¶: Decompressed model. It can be further implemented when the model is compressed in the server.

download(model)[source]¶

Download model from the server.

Parameters: model (nn.Module) – Global model distributed from the server.

encryption()[source]¶: Encrypt the client local model.

load_loader(conf)[source]¶

Load the training data loader.

Parameters: conf (omegaconf.dictconfig.DictConfig) – Client configurations.
Returns: Data loader.
Return type: torch.utils.data.DataLoader

load_optimizer(conf)[source]¶: Load training optimizer. Implemented Adam and SGD.

operate(model, conf, index, is_train=True)[source]¶

A wrapper over operations (training/testing) on clients.

Parameters

model (nn.Module) – Model for operations.
conf (omegaconf.dictconfig.DictConfig) – Client configurations.
index (int) – Client index in the client list, for retrieving data. TODO: improvement.
is_train (bool) – The flag to indicate whether the operation is training, otherwise testing.

post_test()[source]¶: Postprocessing after testing.

post_train()[source]¶: Postprocessing after training.

post_upload()[source]¶: Postprocessing after uploading training/testing results.

pre_test()[source]¶: Preprocessing before testing.

pre_train()[source]¶: Preprocessing before training.

pretrain_setup(conf, device)[source]¶: Setup loss function and optimizer before training.

run_test(model, conf)[source]¶

Conduct testing on clients.

Parameters

model (nn.Module) – Model to test.
conf (omegaconf.dictconfig.DictConfig) – Client configurations.

Returns

Testing contents. Unify the interface for both local and remote operations.

Return type

UploadRequest

run_train(model, conf)[source]¶

Conduct training on clients.

Parameters

model (nn.Module) – Model to train.
conf (omegaconf.dictconfig.DictConfig) – Client configurations.

Returns

Training contents. Unify the interface for both local and remote operations.

Return type

UploadRequest

save_metrics()[source]¶: Save client metrics to database.

simulate_straggler()[source]¶: Simulate straggler effect of system heterogeneity.

start_service()[source]¶: Start client service.

test(conf, device='cpu')[source]¶

Execute client testing.

Parameters

conf (omegaconf.dictconfig.DictConfig) – Client configurations.
device (str) – Hardware device for training, cpu or cuda devices.

test_local()[source]¶: Test client local model after training.

track(metric_name, value)[source]¶

Track a metric.

Parameters

metric_name (str) – The name of the metric.
value (str|int|float|bool|dict|list) – The value of the metric.

train(conf, device='cpu')[source]¶

Execute client training.

Parameters

conf (omegaconf.dictconfig.DictConfig) – Client configurations.
device (str) – Hardware device for training, cpu or cuda devices.

upload()[source]¶

Upload the messages from client to the server.

Returns

The upload request defined in protobuf to unify local and remote operations.: Only applicable for local training as remote training upload through a gRPC request.

Return type

UploadRequest

upload_remotely(request)[source]¶

Send upload request to remote server via gRPC.

Parameters: request (UploadRequest) – Upload request.

class easyfl.client.ClientService(client)[source]¶

“Remote gRPC client service.

Parameters: client (BaseClient) – Federated learning client instance.

Operate(request, context)[source]¶: Perform training/testing operations.

easyfl.distributed¶

easyfl.distributed.dist_init(backend, init_method, world_size, rank, local_rank)[source]¶

Initialize PyTorch distribute.

Parameters

backend (str or Backend) – Distributed backend to use, e.g., nccl, gloo.
init_method (str, optional) – URL specifying how to initialize the process group.
world_size (int, optional) – Number of processes participating in the job.
rank (local) – Rank of the current process.
rank – Local rank of the current process.

Returns

Rank of current process. int: Total number of processes.

Return type

int

easyfl.distributed.gather_value(value, world_size, device)[source]¶

Gather the value from devices to a list.

Parameters

value (float|int) – The value to gather.
world_size (int) – The number of processes.
device (str) – The device where the value is on, either cpu or cuda devices.

Returns

A list of gathered values.

Return type

list[torch.Tensor]

easyfl.distributed.get_device(gpu, world_size, local_rank)[source]¶

Obtain the device by checking the number of GPUs and distributed settings.

Parameters

gpu (int) – The number of requested gpu.
world_size (int) – The number of processes.
local_rank (int) – The local rank of the current process.

Returns

Device to be used in PyTorch like tensor.to(device).

Return type

str

easyfl.distributed.get_ip(node_list)[source]¶

Get the ip address of nodes.

Parameters: node_list (str) – Name of the nodes.
Returns: The first node in the nodes.
Return type: str

easyfl.distributed.grouping(clients, world_size, default_time=10, strategy='random', seed=1)[source]¶

Divide clients into groups with different strategies.

Parameters

clients (list[BaseClient]) – A list of clients.
world_size (int) – The number of processes, it represent the number of groups here.
default_time (float, optional) – The default training time for not profiled clients.
strategy (str, optional) – Strategy of grouping, options: random, greedy, worst. When no strategy is applied, each client is a group.
seed (int, optional) – Random seed.

Returns

Groups of clients, each group is a sub-list.

Return type

list[list[BaseClient]]

easyfl.distributed.reduce_models(model, sample_sum)[source]¶

Aggregate models across devices and update the model with the new aggregated model parameters.

Parameters

model (nn.Module) – The model in a device to aggregate.
sample_sum (int) – Sum of the total dataset sizes of clients in a device.

easyfl.distributed.reduce_models_only_params(model, sample_sum)[source]¶

Aggregate models across devices and update the model with the new aggregated model parameters, excluding the persistent buffers like BN stats.

Parameters

model (nn.Module) – The model in a device to aggregate.
sample_sum (torch.Tensor) – Sum of the total dataset sizes of clients in a device.

easyfl.distributed.reduce_value(value, device)[source]¶

Calculate the sum of the value across devices.

Parameters

value (float/int) – Value to sum.
device (str) – The device where the value is on, either cpu or cuda devices.

Returns

Sum of the values.

Return type

torch.Tensor

easyfl.distributed.reduce_values(values, device)[source]¶

Calculate the average of values across devices.

Parameters

values (list[float|int]) – Values to average.
device (str) – The device where the value is on, either cpu or cuda devices.

Returns

The average of the values across devices.

Return type

torch.Tensor

easyfl.distributed.reduce_weighted_values(values, weights, device)[source]¶

Calculate the weighted average of values across devices.

Parameters

values (list[float|int]) – Values to average.
weights (list[float|int]) – The weights to calculate weighted average.
device (str) – The device where the value is on, either cpu or cuda devices.

Returns

The average of values across devices.

Return type

torch.Tensor

easyfl.distributed.setup(port=23344)[source]¶

Setup distributed settings of slurm.

Parameters: port (int, optional) – The port of the primary server. It respectively auto-increments by 1 when the port is in-use.
Returns: The rank of current process. int: The local rank of current process. int: Total number of processes. str: The address of the distributed init method.
Return type: int

easyfl.dataset¶

class easyfl.datasets.BaseDataset(root, dataset_name, fraction, split_type, user, iid_user_fraction, train_test_split, minsample, num_class, num_of_client, class_per_client, setting_folder, seed=- 1, **kwargs)[source]¶

The internal base dataset implementation.

Parameters

root (str) – The root directory where datasets stored.
dataset_name (str) – The name of the dataset.
fraction (float) – The fraction of the data chosen from the raw data to use.
num_of_clients (int) – The targeted number of clients to construct.
split_type (str) – The type of statistical simulation, options: iid, dir, and class. iid means independent and identically distributed data. niid means non-independent and identically distributed data for Femnist and Shakespeare. dir means using Dirichlet process to simulate non-iid data, for CIFAR-10 and CIFAR-100 datasets. class means partitioning the dataset by label classes, for datasets like CIFAR-10, CIFAR-100.
minsample (int) – The minimal number of samples in each client. It is applicable for LEAF datasets and dir simulation of CIFAR-10 and CIFAR-100.
class_per_client (int) – The number of classes in each client. Only applicable when the split_type is ‘class’.
iid_user_fraction (float) – The fraction of the number of clients used when the split_type is ‘iid’.
user (bool) – A flag to indicate whether partition users of the dataset into train-test groups. Only applicable to LEAF datasets. True means partitioning users of the dataset into train-test groups. False means partitioning each users’ samples into train-test groups.
train_test_split (float) – The fraction of data for training; the rest are for testing. e.g., 0.9 means 90% of data are used for training and 10% are used for testing.
num_class – The number of classes in this dataset.
seed – Random seed.

class easyfl.datasets.Cifar10(root, fraction, split_type, user, iid_user_fraction=0.1, train_test_split=0.9, minsample=10, num_class=80, num_of_client=100, class_per_client=2, setting_folder=None, seed=- 1, weights=None, alpha=0.5)[source]¶

class easyfl.datasets.Cifar100(root, fraction, split_type, user, iid_user_fraction=0.1, train_test_split=0.9, minsample=10, num_class=80, num_of_client=100, class_per_client=2, setting_folder=None, seed=- 1, weights=None, alpha=0.5)[source]¶

class easyfl.datasets.FederatedDataset[source]¶

The abstract class of federated dataset for EasyFL.

abstract loader(batch_size, shuffle=True)[source]¶

Get data loader.

Parameters

batch_size (int) – The batch size of the data loader.
shuffle (bool) – Whether shuffle the data in the loader.

abstract size(cid)[source]¶

Get dataset size.

Parameters: cid (str) – client id.

property users¶: Get client ids of the federated dataset.

class easyfl.datasets.FederatedImageDataset(root, simulated, do_simulate=True, extensions=('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif', '.tiff', '.webp'), is_valid_file=None, transform=None, target_transform=None, client_ids='default', num_of_clients=10, simulation_method='iid', weights=None, alpha=0.5, min_size=10, class_per_client=1)[source]¶

Federated image dataset, data of clients are in format of image folder.

Parameters

root (str|list[str]) – The root directory or directories of image data folder. If the dataset is simulated to multiple clients, the root is a list of directories. Otherwise, it is the directory of an image data folder.
simulated (bool) – Whether the dataset is simulated to federated learning settings.
do_simulate (bool, optional) – Whether conduct simulation. It is only effective if it is not simulated.
extensions (list[str], optional) – A list of allowed image extensions. Only one of extensions and is_valid_file can be specified.
is_valid_file (function, optional) – A function that takes path of an Image file and check if it is valid. Only one of extensions and is_valid_file can be specified.
transform (torchvision.transforms.transforms.Compose, optional) – Transformation for data.
target_transform (torchvision.transforms.transforms.Compose, optional) – Transformation for data labels.
num_of_clients (int, optional) – number of clients for simulation. Only need if doing simulation.
simulation_method (optional) – split method. Only need if doing simulation.
weights (list[float], optional) – The targeted distribution of quantities to simulate quantity heterogeneity. The values should sum up to 1. e.g., [0.1, 0.2, 0.7]. The num_of_clients should be divisible by len(weights). None means clients are simulated with the same data quantity.
alpha (float, optional) – The parameter for Dirichlet distribution simulation, only for dir simulation.
min_size (int, optional) – The minimal number of samples in each client, only for dir simulation.
class_per_client (int, optional) – The number of classes in each client, only for non-iid by class simulation.
client_ids (list[str], optional) – A list of client ids. Each client id matches with an element in roots. The client ids are [“f0000001”, “f00000002”, …] if not specified.

loader(batch_size, client_id=None, shuffle=True, seed=0, num_workers=2, transform=None)[source]¶

Get dataset loader.

Parameters

batch_size (int) – The batch size.
client_id (str, optional) – The id of client.
shuffle (bool, optional) – Whether to shuffle before batching.
seed (int, optional) – The shuffle seed.
transform (torchvision.transforms.transforms.Compose, optional) – Data transformation.
num_workers (int, optional) – The number of workers for dataset loader.

Returns

The data loader to load data.

Return type

torch.utils.data.DataLoader

size(cid=None)[source]¶

Get dataset size.

Parameters: cid (str) – client id.

property users¶: Get client ids of the federated dataset.

class easyfl.datasets.FederatedTensorDataset(data, transform=None, target_transform=None, process_x=<function default_process_x>, process_y=<function default_process_x>, simulated=False, do_simulate=True, num_of_clients=10, simulation_method='iid', weights=None, alpha=0.5, min_size=10, class_per_client=1)[source]¶

Federated tensor dataset, data of clients are in format of tensor or list.

Parameters

data (dict) – A dictionary of data, e.g., {“id1”: {“x”: [[], [], …], “y”: […]]}}. If simulation is not done previously, it is in format of {‘x’:[[],[], …], ‘y’: […]}.
transform (torchvision.transforms.transforms.Compose, optional) – Transformation for data.
target_transform (torchvision.transforms.transforms.Compose, optional) – Transformation for data labels.
process_x (function, optional) – A function to preprocess training data.
process_y (function, optional) – A function to preprocess testing data.
simulated (bool, optional) – Whether the dataset is simulated to federated learning settings.
do_simulate (bool, optional) – Whether conduct simulation. It is only effective if it is not simulated.
num_of_clients (int, optional) – number of clients for simulation. Only need if doing simulation.
simulation_method (optional) – split method. Only need if doing simulation.
weights (list[float], optional) – The targeted distribution of quantities to simulate quantity heterogeneity. The values should sum up to 1. e.g., [0.1, 0.2, 0.7]. The num_of_clients should be divisible by len(weights). None means clients are simulated with the same data quantity.
alpha (float, optional) – The parameter for Dirichlet distribution simulation, only for dir simulation.
min_size (int, optional) – The minimal number of samples in each client, only for dir simulation.
class_per_client (int, optional) – The number of classes in each client, only for non-iid by class simulation.

loader(batch_size, client_id=None, shuffle=True, seed=0, transform=None, drop_last=False)[source]¶

Get dataset loader.

Parameters

batch_size (int) – The batch size.
client_id (str, optional) – The id of client.
shuffle (bool, optional) – Whether to shuffle before batching.
seed (int, optional) – The shuffle seed.
transform (torchvision.transforms.transforms.Compose, optional) – Data transformation.
drop_last (bool, optional) – Whether to drop the last batch if its size is smaller than batch size.

Returns

The data loader to load data.

Return type

torch.utils.data.DataLoader

size(cid=None)[source]¶

Get dataset size.

Parameters: cid (str) – client id.

property users¶: Get client ids of the federated dataset.

class easyfl.datasets.FederatedTorchDataset(data, users)[source]¶

Wrapper over PyTorch dataset.

Parameters: data (dict) – A dictionary of client datasets, format {“client_id”: loader1, “client_id2”: loader2}.

loader(batch_size, client_id=None, shuffle=True, seed=0, num_workers=2, transform=None)[source]¶

Get data loader.

Parameters

batch_size (int) – The batch size of the data loader.
shuffle (bool) – Whether shuffle the data in the loader.

size(cid=None)[source]¶

Get dataset size.

Parameters: cid (str) – client id.

property users¶: Get client ids of the federated dataset.

class easyfl.datasets.Femnist(root, fraction, split_type, user, iid_user_fraction=0.1, train_test_split=0.9, minsample=10, num_class=62, num_of_client=100, class_per_client=2, setting_folder=None, seed=- 1, **kwargs)[source]¶

FEMNIST dataset implementation. It gets FEMNIST dataset according to configurations.: It stores the processed datasets locally.

base_folder¶

The base folder path of the datasets folder.

Type: str

class_url¶

The url to get the by_class split FEMNIST.

Type: str

write_url¶

The url to get the by_write split FEMNIST.

Type: str

class easyfl.datasets.Shakespeare(root, fraction, split_type, user, iid_user_fraction=0.1, train_test_split=0.9, minsample=10, num_class=80, num_of_client=100, class_per_client=2, setting_folder=None, seed=- 1, **kwargs)[source]¶

Shakespeare dataset implementation. It gets Shakespeare dataset according to configurations.

base_folder¶

The base folder path of the datasets folder.

Type: str

raw_data_url¶

The url to get the by_class split shakespeare.

Type: str

write_url¶

The url to get the by_write split shakespeare.

Type: str

easyfl.datasets.construct_datasets(root, dataset_name, num_of_clients, split_type, min_size, class_per_client, data_amount, iid_fraction, user, train_test_split, quantity_weights, alpha)[source]¶

Construct and load provided federated learning datasets.

Parameters

root (str) – The root directory where datasets stored.
dataset_name (str) – The name of the dataset. It currently supports: femnist, shakespeare, cifar10, and cifar100. Among them, femnist and shakespeare are adopted from LEAF benchmark.
num_of_clients (int) – The targeted number of clients to construct.
split_type (str) – The type of statistical simulation, options: iid, dir, and class. iid means independent and identically distributed data. niid means non-independent and identically distributed data for Femnist and Shakespeare. dir means using Dirichlet process to simulate non-iid data, for CIFAR-10 and CIFAR-100 datasets. class means partitioning the dataset by label classes, for datasets like CIFAR-10, CIFAR-100.
min_size (int) – The minimal number of samples in each client. It is applicable for LEAF datasets and dir simulation of CIFAR-10 and CIFAR-100.
class_per_client (int) – The number of classes in each client. Only applicable when the split_type is ‘class’.
data_amount (float) – The fraction of data sampled for LEAF datasets. e.g., 10% means that only 10% of total dataset size are used.
iid_fraction (float) – The fraction of the number of clients used when the split_type is ‘iid’.
user (bool) – A flag to indicate whether partition users of the dataset into train-test groups. Only applicable to LEAF datasets. True means partitioning users of the dataset into train-test groups. False means partitioning each users’ samples into train-test groups.
train_test_split (float) – The fraction of data for training; the rest are for testing. e.g., 0.9 means 90% of data are used for training and 10% are used for testing.
quantity_weights (list[float]) – The targeted distribution of quantities to simulate data quantity heterogeneity. The values should sum up to 1. e.g., [0.1, 0.2, 0.7]. The num_of_clients should be divisible by len(weights). None means clients are simulated with the same data quantity.
alpha (float) – The parameter for Dirichlet distribution simulation, applicable only when split_type is dir.

Returns

Training dataset. FederatedDataset: Testing dataset.

Return type

FederatedDataset

easyfl.datasets.data_simulation(data_x, data_y, num_of_clients, data_distribution, weights=None, alpha=0.5, min_size=10, class_per_client=1, stack_x=True)[source]¶

Simulate federated learning datasets by partitioning a data into multiple clients using different strategies.

Parameters

data_x (list[Object]) – A list of data.
data_y (list[Object]) – A list of dataset labels.
num_of_clients (int) – The number of clients to partition to.
data_distribution (str) – The ways to partition the dataset, options: iid: Partition dataset into multiple clients with equal quantity (difference is less than 1) randomly. dir: partition dataset into multiple clients following the Dirichlet process. class: partition dataset into multiple clients based on classes.
weights (list[float], optional) – list, for simulating data quantity heterogeneity If None, each client are simulated with same data quantity Note: num_of_clients should be divisible by len(weights)
weights – The targeted distribution of data quantities. The values should sum up to 1. e.g., [0.1, 0.2, 0.7]. When weights=None, the data quantity of clients only depends on data_distribution.
alpha (float, optional) – The parameter for Dirichlet process simulation. It is only applicable when data_distribution is dir.
min_size (int, optional) – The minimum number of data size of a client. It is only applicable when data_distribution is dir.
class_per_client (int) – The number of classes in each client. It is only applicable when data_distribution is class.
stack_x (bool, optional) – A flag to indicate whether using np.vstack or append to construct dataset. It is only applicable when data_distribution is class.

Raises

ValueError – When the simulation method data_distribution is not supported.

Returns

A list of client ids. dict: The partitioned data, key is client id, value is the client data. e.g., {‘client_1’: {‘x’: [data_x], ‘y’: [data_y]}}.

Return type

list[str]

easyfl.datasets.equal_division(num_groups, data_x, data_y=None)[source]¶

Partition data into multiple clients with equal quantity.

Parameters

num_groups (int) – THe number of groups to partition to.
data_x (list[Object]) – A list of elements to be divided.
data_y (list[Object], optional) – A list of data labels to be divided together with the data.

Returns

A list where each element is a list of data of a group/client. list[list]: A list where each element is a list of data label of a group/client.

Return type

list[list]

Example

>>> equal_division(3, list[range(9)])
>>> ([[0,4,2],[3,1,7],[6,5,8]], [])

easyfl.datasets.iid(data_x, data_y, num_of_clients, x_dtype, y_dtype)[source]¶

Partition dataset into multiple clients with equal data quantity (difference is less than 1) randomly.

Parameters

data_x (list[Object]) – A list of data.
data_y (list[Object]) – A list of dataset labels.
num_of_clients (int) – The number of clients to partition to.
x_dtype (numpy.dtype) – The type of data.
y_dtype (numpy.dtype) – The type of data label.

Returns

A list of client ids. dict: The partitioned data, key is client id, value is the client data. e.g., {‘client_1’: {‘x’: [data_x], ‘y’: [data_y]}}.

Return type

list[str]

easyfl.datasets.non_iid_class(data_x, data_y, class_per_client, num_of_clients, x_dtype, y_dtype, stack_x=True)[source]¶

Partition dataset into multiple clients based on label classes. Each client contains [1, n] classes, where n is the number of classes of a dataset.

Note: Each class is divided into ceil(class_per_client * num_of_clients / num_class) parts: and each client chooses class_per_client parts from each class to construct its dataset.

Parameters

data_x (list[Object]) – A list of data.
data_y (list[Object]) – A list of dataset labels.
class_per_client (int) – The number of classes in each client.
num_of_clients (int) – The number of clients to partition to.
x_dtype (numpy.dtype) – The type of data.
y_dtype (numpy.dtype) – The type of data label.
stack_x (bool, optional) – A flag to indicate whether using np.vstack or append to construct dataset.

Returns

A list of client ids. dict: The partitioned data, key is client id, value is the client data. e.g., {‘client_1’: {‘x’: [data_x], ‘y’: [data_y]}}.

Return type

list[str]

easyfl.datasets.non_iid_dirichlet(data_x, data_y, num_of_clients, alpha, min_size, x_dtype, y_dtype)[source]¶

Partition dataset into multiple clients following the Dirichlet process.

Parameters

data_x (list[Object]) – A list of data.
data_y (list[Object]) – A list of dataset labels.
num_of_clients (int) – The number of clients to partition to.
alpha (float) – The parameter for Dirichlet process simulation.
min_size (int) – The minimum number of data size of a client.
x_dtype (numpy.dtype) – The type of data.
y_dtype (numpy.dtype) – The type of data label.

Returns

A list of client ids. dict: The partitioned data, key is client id, value is the client data. e.g., {‘client_1’: {‘x’: [data_x], ‘y’: [data_y]}}.

Return type

list[str]

easyfl.datasets.quantity_hetero(weights, data_x, data_y=None)[source]¶

Partition data into multiple clients with different quantities. The number of groups is the same as the number of elements of weights. The quantity of each group depends on the values of weights.

Parameters

weights (list[float]) – The targeted distribution of data quantities. The values should sum up to 1. e.g., [0.1, 0.2, 0.7].
data_x (list[Object]) – A list of elements to be divided.
data_y (list[Object], optional) – A list of data labels to be divided together with the data.

Returns

A list where each element is a list of data of a group/client. list[list]: A list where each element is a list of data label of a group/client.

Return type

list[list]

Example

>>> quantity_hetero([0.1, 0.2, 0.7], list(range(0, 10)))
>>> ([[4], [8, 9], [6, 0, 1, 7, 3, 2, 5]], [])

easyfl.models¶

easyfl.communication¶

easyfl.communication.init_stub(typ, address)[source]¶

Initialize gRPC stub.

Parameters

typ (str) – Type of service, option: client, server, tracking
address (str) – Address of the gRPC service.

Returns

stub of the gRPC service.

Return type

(ClientServiceStub`|:obj:`ServerServiceStub`|:obj:`TrackingServiceStub)

easyfl.communication.start_service(typ, service, port)[source]¶: Start gRPC service. :param typ: Type of service, option: client, server, tracking. :type typ: str :param service: gRPC service to start. :type service: ClientService`|:obj:`ServerService`|:obj:`TrackingService :param port: The port of the service. :type port: int

easyfl.registry¶

class easyfl.registry.EtcdClient(name, addrs, base_dir, use_mock_etcd=False)[source]¶

Etcd client to connect and communicate with etcd service. Etcd is the serves as the registry for remote training. Clients register themselves in etcd and server queries etcd to get client addresses.

Parameters

name (str) – The name of etcd.
addrs (str) – Etcd addresses, format: “<ip>:<port>,<ip>:<port>”.
base_dir (str) – The prefix of all etcd requests, default to “backends”.
use_mock_etcd (bool) – Whether use mocked etcd for testing.

get_clients(prefix)[source]¶

Retrieve client addresses from etcd using prefix.

Parameters: prefix (str) – the prefix of clients addresses; default is the docker image name “easyfl-client”
Returns: A list of clients.
Return type: list[VirtualClient]

easyfl.registry.get_clients(source, etcd_addresses=None)[source]¶

Get clients from registry.

Parameters

source (str) – Registry source, options: manual, etcd, kubernetes.
etcd_addresses (str, optional) – The addresses of etcd service.

Returns

A list of clients with addresses.

Return type

list[VirtualClient]