Shortcuts

easyfl

easyfl.init(conf=None, init_all=True)[source]

Initialize EasyFL.

Parameters
  • conf (dict, optional) – Configurations.

  • init_all (bool, optional) – Whether initialize dataset, model, server, and client other than configuration.

easyfl.init_dataset()[source]

Initialize dataset, either using registered dataset or out-of-the-box datasets set in config.

easyfl.init_model()[source]

Initialize model, either using registered model or out-of–the-box model set in config.

Returns

Model used in federated learning.

Return type

nn.Module

easyfl.load_config(file, conf=None)[source]

Load and merge configuration from file and input

Parameters
  • file (str) – filename of the configuration.

  • conf (dict) – Configurations.

Returns

Internal configurations managed by OmegaConf.

Return type

omegaconf.dictconfig.DictConfig

easyfl.register_client(client)[source]

Register federated learning client.

Parameters

client (BaseClient) – Customized federated learning client.

easyfl.register_dataset(train_data, test_data, val_data=None)[source]

Register datasets for federated learning training.

Parameters
  • train_data (FederatedDataset) – Training dataset.

  • test_data (FederatedDataset) – Testing dataset.

  • val_data (FederatedDataset) – Validation dataset.

easyfl.register_model(model)[source]

Register model for federated learning training.

Parameters

model (nn.Module) – PyTorch model, both class and instance are acceptable.

easyfl.register_server(server)[source]

Register federated learning server.

Parameters

server (BaseServer) – Customized federated learning server.

easyfl.run()[source]

Run federated learning process.

easyfl.start_client(args=None)[source]

Start federated learning client service for remote training.

Parameters

args (argparse.Namespace) – Configurations passed in as arguments.

easyfl.start_remote_client(conf=None, train_data=None, test_data=None, model=None, client=None)[source]

Start a remote client.

Parameters
  • conf (dict) – Configurations. optional, Use the configuration loaded from file if not provided. It overwrites the configurations from file.

  • train_data (FederatedDataset) – Training dataset.

  • test_data (FederatedDataset) – Testing dataset.

  • model (nn.Module) – Model used in client training.

  • client (BaseClient) – Customized federated learning client class.

easyfl.start_remote_server(conf=None, test_data=None, model=None, server=None)[source]

Start a remote server.

Parameters
  • conf (dict) – Configurations. optional, Use the configuration loaded from file if not provided. It overwrites the configurations from file.

  • test_data (FederatedDataset) – Test dataset for centralized testing on server.

  • model (nn.Module) – Model used in client training.

  • server (BaseServer) – Customized federated learning server class.

easyfl.start_server(args=None)[source]

Start federated learning server service for remote training.

Parameters

args (argparse.Namespace) – Configurations passed in as arguments.

easyfl.server

class easyfl.server.BaseServer(conf, test_data=None, val_data=None, is_remote=False, local_port=22999)[source]

Default implementation of federated learning server.

Parameters
  • conf (omegaconf.dictconfig.DictConfig) – Configurations of EasyFL.

  • test_data (FederatedDataset) – Test dataset for centralized testing in server, optional.

  • val_data (FederatedDataset) – Validation dataset for centralized validation in server, optional.

  • is_remote (bool) – A flag to indicate whether start remote training.

  • local_port (int) – The port of remote server service.

Override the class and functions to implement customized server.

Example

>>> from easyfl.server import BaseServer
>>> class CustomizedServer(BaseServer):
>>>     def __init__(self, conf, test_data=None, val_data=None, is_remote=False, local_port=22999):
>>>         super(CustomizedServer, self).__init__(conf, test_data, val_data, is_remote, local_port)
>>>         pass  # more initialization of attributes.
>>>
>>>     def aggregation(self):
>>>         # Implement customized aggregation method, which overwrites the default aggregation method.
>>>         pass
aggregate(models, weights)[source]

Aggregate models uploaded from clients via federated averaging.

Parameters
  • models (list[nn.Module]) – List of models.

  • weights (list[float]) – List of weights, corresponding to each model. Weights are dataset size of clients by default.

Returns

nn.Module: Aggregated model.

aggregation()[source]

Aggregate training updates from clients. Server aggregates trained models from clients via federated averaging.

aggregation_test()[source]

Aggregate testing results from clients.

Returns

Test metrics, format in {“test_loss”: value, “test_accuracy”: value}

Return type

dict

compression()[source]

Model compression to reduce communication cost.

decompression(model)[source]

Decompression the models from clients

distribution_to_test()[source]

Distribute to conduct testing on clients.

distribution_to_test_locally()[source]

Conduct testing sequentially for selected testing clients.

distribution_to_test_remotely()[source]

Distribute testing requests to remote clients through multiple threads. The main thread waits for signal to proceed. The signal can be triggered via notification, as below example.

Example to trigger signal:
>>> with self.condition():
>>>     self.notify_all()
distribution_to_train()[source]

Distribute model and configurations to selected clients to train.

distribution_to_train_locally()[source]

Conduct training sequentially for selected clients in the group.

distribution_to_train_remotely()[source]

Distribute training requests to remote clients through multiple threads. The main thread waits for signal to proceed. The signal can be triggered via notification, as below example.

Example to trigger signal:
>>> with self.condition():
>>>     self.notify_all()
gather_client_train_metrics()[source]

Gather client train metrics from other ranks for distributed training, when testing all clients (test_all). When testing all clients, the trained metrics may be override by the test metrics because clients may be placed in different GPUs in training and testing, leading to losses of train metrics. So we gather train metrics and set them in test metrics. TODO: gather is not progressing. Need fix.

get_client_uploads()[source]

Get client uploaded contents.

Returns

A dictionary that contains client uploaded contents.

Return type

dict

get_test_clients()[source]

Get clients to run testing.

Returns

Clients to test.

Return type

(list[BaseClient]|list[str])

grouping_for_distributed()[source]

Divide the selected clients into groups for distributed training. Each group of clients is assigned to conduct training in one GPU. The number of groups = the number of gpus.

Not in distributed training, selected clients are in the same group. In distributed, selected clients are grouped with different strategies: greedy and random.

init_etcd(addresses)[source]

Initialize etcd as the registry for client registration.

Parameters

addresses (str) – The etcd addresses split by “,”

init_tracker()[source]

Initialize tracking

is_primary_server()[source]

Check whether the current process is the primary server. In standalone or remote training, the server is primary. In distributed training, the server on rank0 is primary.

Returns

A flag to indicate whether current process is the primary server.

Return type

bool

is_training()[source]

Check whether the server is in training or has stopped training.

Returns

A flag to indicate whether server is in training.

Return type

bool

post_test()[source]

Postprocessing after testing.

post_train()[source]

Postprocessing after training.

pre_test()[source]

Preprocessing before testing.

pre_train()[source]

Preprocessing before training.

print_(content)[source]

print only the server is primary server.

Parameters

content (str) – The content to log.

profile_training_speed()[source]

Manage profiling of client training speeds for distributed training optimization.

save_model()[source]

Save the model in the server.

save_tracker()[source]

Save metrics in the tracker to database.

selection(clients, clients_per_round)[source]

Select a fraction of total clients for training. Two selection strategies are implemented: 1. random selection; 2. select the first K clients.

Parameters
  • clients (list[BaseClient]|list[str]) – Available clients.

  • clients_per_round (int) – Number of clients to participate in training each round.

Returns

The selected clients.

Return type

(list[BaseClient]|list[str])

set_client_uploads(key, value)[source]

A general function to set uploaded content from clients.

Parameters
  • key (str) – Dictionary key.

  • value – Uploaded content.

set_client_uploads_test(accuracies, losses, test_sizes, metrics=None)[source]

Set testing results uploaded from clients.

Parameters
  • accuracies (list[float]) – Testing accuracies of clients.

  • losses (list[float]) – Testing losses of clients.

  • test_sizes (list[float]) – Test dataset sizes of clients.

  • metrics (dict) – Client testing metrics.

set_client_uploads_train(models, weights, metrics=None)[source]

Set training updates uploaded from clients.

Parameters
  • models (dict) – A collection of models.

  • weights (dict) – A collection of weights.

  • metrics (dict) – Client training metrics.

set_model(model, load_dict=False)[source]

Update the universal model in the server.

Parameters
  • model (nn.Module) – New model.

  • load_dict (bool) – A flag to indicate whether load state dict or copy the model.

should_stop()[source]

Check whether should stop training. Stops the training under two conditions: 1. Reach max number of training rounds 2. TODO: Accuracy higher than certain amount.

Returns

A flag to indicate whether should stop training.

Return type

bool

start(model, clients)[source]

Start federated learning process, including training and testing.

Parameters
  • model (nn.Module) – The model to train.

  • clients (list[BaseClient]|list[str]) – Available clients. Clients are actually client grpc addresses when in remote training.

start_remote_training(model, clients)[source]

Start federated learning in the remote training mode. Server establishes gPRC connection with clients that are not connected first before training.

Parameters
  • model (nn.Module) – The model to train.

  • clients (list[str]) – Client addresses.

start_service()[source]

Start federated learning server GRPC service.

stop()[source]

Set the flag to indicate training should stop.

test()[source]

Testing process of federated learning.

test_in_client()[source]

Conduct testing in clients. Currently, it supports testing on the selected clients for training. TODO: Add optionals to select clients for testing.

Returns

Test metrics, {“test_loss”: value, “test_accuracy”: value, “test_time”: value}.

Return type

dict

test_in_server(device='cpu')[source]

Conduct testing in the server.

Parameters

device (str) – The hardware device to conduct testing, either cpu or cuda devices.

Returns

Test metrics, {“test_loss”: value, “test_accuracy”: value, “test_time”: value}.

Return type

dict

track(metric_name, value)[source]

Track a metric.

Parameters
  • metric_name (str) – Name of the metric of a round.

  • value (str|int|float|bool|dict|list) – Value of the metric.

track_communication_cost()[source]

Track communication cost among server and clients. Communication cost occurs in training and testing with downlink and uplink costs.

track_test_results(results)[source]

Track test results collected from clients.

Parameters

results (dict) – Test metrics, format in {“test_loss”: value, “test_accuracy”: value, “test_time”: value}

train()[source]

Training process of federated learning.

update_default_time()[source]

Update the estimated default training time of clients using actual training time from profiled clients.

class easyfl.server.ServerService(server)[source]

“Remote gRPC server service.

Parameters

server (BaseServer) – Federated learning server instance.

Run(request, context)[source]

Trigger federated learning process.

Stop(request, context)[source]

Stop federated learning process.

Upload(request, context)[source]

Handle upload from clients.

easyfl.server.federated_averaging(models, weights)[source]

Compute weighted average of model parameters and persistent buffers. Using state_dict of model, including persistent buffers like BN stats.

Parameters
  • models (list[nn.Module]) – List of models to average.

  • weights (list[float]) – List of weights, corresponding to each model. Weights are dataset size of clients by default.

Returns

nn.Module: Weighted averaged model.

easyfl.server.federated_averaging_only_params(models, weights)[source]

Compute weighted average of model parameters. Use model parameters only.

Parameters
  • models (list[nn.Module]) – List of models to average.

  • weights (list[float]) – List of weights, corresponding to each model. Weights are dataset size of clients by default.

Returns

nn.Module: Weighted averaged model.

easyfl.server.weighted_sum(models, weights)[source]

Compute weighted sum of model parameters and persistent buffers. Using state_dict of model, including persistent buffers like BN stats.

Parameters
  • models (list[nn.Module]) – List of models to average.

  • weights (list[float]) – List of weights, corresponding to each model. Weights are dataset size of clients by default.

Returns

nn.Module: Weighted averaged model. float: Sum of weights.

easyfl.server.weighted_sum_only_params(models, weights)[source]

Compute weighted sum of model parameters. Use model parameters only.

Parameters
  • models (list[nn.Module]) – List of models to average.

  • weights (list[float]) – List of weights, corresponding to each model. Weights are dataset size of clients by default.

Returns

nn.Module: Weighted averaged model. float: Sum of weights.

easyfl.client

class easyfl.client.BaseClient(cid, conf, train_data, test_data, device, sleep_time=0, is_remote=False, local_port=23000, server_addr='localhost:22999', tracker_addr='localhost:12666')[source]

Default implementation of federated learning client.

Parameters
  • cid (str) – Client id.

  • conf (omegaconf.dictconfig.DictConfig) – Client configurations.

  • train_data (FederatedDataset) – Training dataset.

  • test_data (FederatedDataset) – Test dataset.

  • device (str) – Hardware device for training, cpu or cuda devices.

  • sleep_time (float) – Duration of on hold after training to simulate stragglers.

  • is_remote (bool) – Whether start remote training.

  • local_port (int) – Port of remote client service.

  • server_addr (str) – Remote server service grpc address.

  • tracker_addr (str) – Remote tracking service grpc address.

Override the class and functions to implement customized client.

Example

>>> from easyfl.client import BaseClient
>>> class CustomizedClient(BaseClient):
>>>     def __init__(self, cid, conf, train_data, test_data, device, **kwargs):
>>>         super(CustomizedClient, self).__init__(cid, conf, train_data, test_data, device, **kwargs)
>>>         pass  # more initialization of attributes.
>>>
>>>     def train(self, conf, device=CPU):
>>>         # Implement customized client training method, which overwrites the default training method.
>>>         pass
compression()[source]

Compress the client local model after training and before uploading to the server.

connect_to_server()[source]

Establish connection between the client and the server.

construct_upload_request()[source]

Construct client upload request for training updates and testing results.

Returns

The upload request defined in protobuf to unify local and remote operations.

Return type

UploadRequest

decompression()[source]

Decompressed model. It can be further implemented when the model is compressed in the server.

download(model)[source]

Download model from the server.

Parameters

model (nn.Module) – Global model distributed from the server.

encryption()[source]

Encrypt the client local model.

load_loader(conf)[source]

Load the training data loader.

Parameters

conf (omegaconf.dictconfig.DictConfig) – Client configurations.

Returns

Data loader.

Return type

torch.utils.data.DataLoader

load_optimizer(conf)[source]

Load training optimizer. Implemented Adam and SGD.

operate(model, conf, index, is_train=True)[source]

A wrapper over operations (training/testing) on clients.

Parameters
  • model (nn.Module) – Model for operations.

  • conf (omegaconf.dictconfig.DictConfig) – Client configurations.

  • index (int) – Client index in the client list, for retrieving data. TODO: improvement.

  • is_train (bool) – The flag to indicate whether the operation is training, otherwise testing.

post_test()[source]

Postprocessing after testing.

post_train()[source]

Postprocessing after training.

post_upload()[source]

Postprocessing after uploading training/testing results.

pre_test()[source]

Preprocessing before testing.

pre_train()[source]

Preprocessing before training.

pretrain_setup(conf, device)[source]

Setup loss function and optimizer before training.

run_test(model, conf)[source]

Conduct testing on clients.

Parameters
  • model (nn.Module) – Model to test.

  • conf (omegaconf.dictconfig.DictConfig) – Client configurations.

Returns

Testing contents. Unify the interface for both local and remote operations.

Return type

UploadRequest

run_train(model, conf)[source]

Conduct training on clients.

Parameters
  • model (nn.Module) – Model to train.

  • conf (omegaconf.dictconfig.DictConfig) – Client configurations.

Returns

Training contents. Unify the interface for both local and remote operations.

Return type

UploadRequest

save_metrics()[source]

Save client metrics to database.

simulate_straggler()[source]

Simulate straggler effect of system heterogeneity.

start_service()[source]

Start client service.

test(conf, device='cpu')[source]

Execute client testing.

Parameters
  • conf (omegaconf.dictconfig.DictConfig) – Client configurations.

  • device (str) – Hardware device for training, cpu or cuda devices.

test_local()[source]

Test client local model after training.

track(metric_name, value)[source]

Track a metric.

Parameters
  • metric_name (str) – The name of the metric.

  • value (str|int|float|bool|dict|list) – The value of the metric.

train(conf, device='cpu')[source]

Execute client training.

Parameters
  • conf (omegaconf.dictconfig.DictConfig) – Client configurations.

  • device (str) – Hardware device for training, cpu or cuda devices.

upload()[source]

Upload the messages from client to the server.

Returns

The upload request defined in protobuf to unify local and remote operations.

Only applicable for local training as remote training upload through a gRPC request.

Return type

UploadRequest

upload_remotely(request)[source]

Send upload request to remote server via gRPC.

Parameters

request (UploadRequest) – Upload request.

class easyfl.client.ClientService(client)[source]

“Remote gRPC client service.

Parameters

client (BaseClient) – Federated learning client instance.

Operate(request, context)[source]

Perform training/testing operations.

easyfl.distributed

easyfl.distributed.dist_init(backend, init_method, world_size, rank, local_rank)[source]

Initialize PyTorch distribute.

Parameters
  • backend (str or Backend) – Distributed backend to use, e.g., nccl, gloo.

  • init_method (str, optional) – URL specifying how to initialize the process group.

  • world_size (int, optional) – Number of processes participating in the job.

  • rank (local) – Rank of the current process.

  • rank – Local rank of the current process.

Returns

Rank of current process. int: Total number of processes.

Return type

int

easyfl.distributed.gather_value(value, world_size, device)[source]

Gather the value from devices to a list.

Parameters
  • value (float|int) – The value to gather.

  • world_size (int) – The number of processes.

  • device (str) – The device where the value is on, either cpu or cuda devices.

Returns

A list of gathered values.

Return type

list[torch.Tensor]

easyfl.distributed.get_device(gpu, world_size, local_rank)[source]

Obtain the device by checking the number of GPUs and distributed settings.

Parameters
  • gpu (int) – The number of requested gpu.

  • world_size (int) – The number of processes.

  • local_rank (int) – The local rank of the current process.

Returns

Device to be used in PyTorch like tensor.to(device).

Return type

str

easyfl.distributed.get_ip(node_list)[source]

Get the ip address of nodes.

Parameters

node_list (str) – Name of the nodes.

Returns

The first node in the nodes.

Return type

str

easyfl.distributed.grouping(clients, world_size, default_time=10, strategy='random', seed=1)[source]

Divide clients into groups with different strategies.

Parameters
  • clients (list[BaseClient]) – A list of clients.

  • world_size (int) – The number of processes, it represent the number of groups here.

  • default_time (float, optional) – The default training time for not profiled clients.

  • strategy (str, optional) – Strategy of grouping, options: random, greedy, worst. When no strategy is applied, each client is a group.

  • seed (int, optional) – Random seed.

Returns

Groups of clients, each group is a sub-list.

Return type

list[list[BaseClient]]

easyfl.distributed.reduce_models(model, sample_sum)[source]

Aggregate models across devices and update the model with the new aggregated model parameters.

Parameters
  • model (nn.Module) – The model in a device to aggregate.

  • sample_sum (int) – Sum of the total dataset sizes of clients in a device.

easyfl.distributed.reduce_models_only_params(model, sample_sum)[source]

Aggregate models across devices and update the model with the new aggregated model parameters, excluding the persistent buffers like BN stats.

Parameters
  • model (nn.Module) – The model in a device to aggregate.

  • sample_sum (torch.Tensor) – Sum of the total dataset sizes of clients in a device.

easyfl.distributed.reduce_value(value, device)[source]

Calculate the sum of the value across devices.

Parameters
  • value (float/int) – Value to sum.

  • device (str) – The device where the value is on, either cpu or cuda devices.

Returns

Sum of the values.

Return type

torch.Tensor

easyfl.distributed.reduce_values(values, device)[source]

Calculate the average of values across devices.

Parameters
  • values (list[float|int]) – Values to average.

  • device (str) – The device where the value is on, either cpu or cuda devices.

Returns

The average of the values across devices.

Return type

torch.Tensor

easyfl.distributed.reduce_weighted_values(values, weights, device)[source]

Calculate the weighted average of values across devices.

Parameters
  • values (list[float|int]) – Values to average.

  • weights (list[float|int]) – The weights to calculate weighted average.

  • device (str) – The device where the value is on, either cpu or cuda devices.

Returns

The average of values across devices.

Return type

torch.Tensor

easyfl.distributed.setup(port=23344)[source]

Setup distributed settings of slurm.

Parameters

port (int, optional) – The port of the primary server. It respectively auto-increments by 1 when the port is in-use.

Returns

The rank of current process. int: The local rank of current process. int: Total number of processes. str: The address of the distributed init method.

Return type

int

easyfl.dataset

class easyfl.datasets.BaseDataset(root, dataset_name, fraction, split_type, user, iid_user_fraction, train_test_split, minsample, num_class, num_of_client, class_per_client, setting_folder, seed=- 1, **kwargs)[source]

The internal base dataset implementation.

Parameters
  • root (str) – The root directory where datasets stored.

  • dataset_name (str) – The name of the dataset.

  • fraction (float) – The fraction of the data chosen from the raw data to use.

  • num_of_clients (int) – The targeted number of clients to construct.

  • split_type (str) – The type of statistical simulation, options: iid, dir, and class. iid means independent and identically distributed data. niid means non-independent and identically distributed data for Femnist and Shakespeare. dir means using Dirichlet process to simulate non-iid data, for CIFAR-10 and CIFAR-100 datasets. class means partitioning the dataset by label classes, for datasets like CIFAR-10, CIFAR-100.

  • minsample (int) – The minimal number of samples in each client. It is applicable for LEAF datasets and dir simulation of CIFAR-10 and CIFAR-100.

  • class_per_client (int) – The number of classes in each client. Only applicable when the split_type is ‘class’.

  • iid_user_fraction (float) – The fraction of the number of clients used when the split_type is ‘iid’.

  • user (bool) – A flag to indicate whether partition users of the dataset into train-test groups. Only applicable to LEAF datasets. True means partitioning users of the dataset into train-test groups. False means partitioning each users’ samples into train-test groups.

  • train_test_split (float) – The fraction of data for training; the rest are for testing. e.g., 0.9 means 90% of data are used for training and 10% are used for testing.

  • num_class – The number of classes in this dataset.

  • seed – Random seed.

class easyfl.datasets.Cifar10(root, fraction, split_type, user, iid_user_fraction=0.1, train_test_split=0.9, minsample=10, num_class=80, num_of_client=100, class_per_client=2, setting_folder=None, seed=- 1, weights=None, alpha=0.5)[source]
class easyfl.datasets.Cifar100(root, fraction, split_type, user, iid_user_fraction=0.1, train_test_split=0.9, minsample=10, num_class=80, num_of_client=100, class_per_client=2, setting_folder=None, seed=- 1, weights=None, alpha=0.5)[source]
class easyfl.datasets.FederatedDataset[source]

The abstract class of federated dataset for EasyFL.

abstract loader(batch_size, shuffle=True)[source]

Get data loader.

Parameters
  • batch_size (int) – The batch size of the data loader.

  • shuffle (bool) – Whether shuffle the data in the loader.

abstract size(cid)[source]

Get dataset size.

Parameters

cid (str) – client id.

property users

Get client ids of the federated dataset.

class easyfl.datasets.FederatedImageDataset(root, simulated, do_simulate=True, extensions=('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif', '.tiff', '.webp'), is_valid_file=None, transform=None, target_transform=None, client_ids='default', num_of_clients=10, simulation_method='iid', weights=None, alpha=0.5, min_size=10, class_per_client=1)[source]

Federated image dataset, data of clients are in format of image folder.

Parameters
  • root (str|list[str]) – The root directory or directories of image data folder. If the dataset is simulated to multiple clients, the root is a list of directories. Otherwise, it is the directory of an image data folder.

  • simulated (bool) – Whether the dataset is simulated to federated learning settings.

  • do_simulate (bool, optional) – Whether conduct simulation. It is only effective if it is not simulated.

  • extensions (list[str], optional) – A list of allowed image extensions. Only one of extensions and is_valid_file can be specified.

  • is_valid_file (function, optional) – A function that takes path of an Image file and check if it is valid. Only one of extensions and is_valid_file can be specified.

  • transform (torchvision.transforms.transforms.Compose, optional) – Transformation for data.

  • target_transform (torchvision.transforms.transforms.Compose, optional) – Transformation for data labels.

  • num_of_clients (int, optional) – number of clients for simulation. Only need if doing simulation.

  • simulation_method (optional) – split method. Only need if doing simulation.

  • weights (list[float], optional) – The targeted distribution of quantities to simulate quantity heterogeneity. The values should sum up to 1. e.g., [0.1, 0.2, 0.7]. The num_of_clients should be divisible by len(weights). None means clients are simulated with the same data quantity.

  • alpha (float, optional) – The parameter for Dirichlet distribution simulation, only for dir simulation.

  • min_size (int, optional) – The minimal number of samples in each client, only for dir simulation.

  • class_per_client (int, optional) – The number of classes in each client, only for non-iid by class simulation.

  • client_ids (list[str], optional) – A list of client ids. Each client id matches with an element in roots. The client ids are [“f0000001”, “f00000002”, …] if not specified.

loader(batch_size, client_id=None, shuffle=True, seed=0, num_workers=2, transform=None)[source]

Get dataset loader.

Parameters
  • batch_size (int) – The batch size.

  • client_id (str, optional) – The id of client.

  • shuffle (bool, optional) – Whether to shuffle before batching.

  • seed (int, optional) – The shuffle seed.

  • transform (torchvision.transforms.transforms.Compose, optional) – Data transformation.

  • num_workers (int, optional) – The number of workers for dataset loader.

Returns

The data loader to load data.

Return type

torch.utils.data.DataLoader

size(cid=None)[source]

Get dataset size.

Parameters

cid (str) – client id.

property users

Get client ids of the federated dataset.

class easyfl.datasets.FederatedTensorDataset(data, transform=None, target_transform=None, process_x=<function default_process_x>, process_y=<function default_process_x>, simulated=False, do_simulate=True, num_of_clients=10, simulation_method='iid', weights=None, alpha=0.5, min_size=10, class_per_client=1)[source]

Federated tensor dataset, data of clients are in format of tensor or list.

Parameters
  • data (dict) – A dictionary of data, e.g., {“id1”: {“x”: [[], [], …], “y”: […]]}}. If simulation is not done previously, it is in format of {‘x’:[[],[], …], ‘y’: […]}.

  • transform (torchvision.transforms.transforms.Compose, optional) – Transformation for data.

  • target_transform (torchvision.transforms.transforms.Compose, optional) – Transformation for data labels.

  • process_x (function, optional) – A function to preprocess training data.

  • process_y (function, optional) – A function to preprocess testing data.

  • simulated (bool, optional) – Whether the dataset is simulated to federated learning settings.

  • do_simulate (bool, optional) – Whether conduct simulation. It is only effective if it is not simulated.

  • num_of_clients (int, optional) – number of clients for simulation. Only need if doing simulation.

  • simulation_method (optional) – split method. Only need if doing simulation.

  • weights (list[float], optional) – The targeted distribution of quantities to simulate quantity heterogeneity. The values should sum up to 1. e.g., [0.1, 0.2, 0.7]. The num_of_clients should be divisible by len(weights). None means clients are simulated with the same data quantity.

  • alpha (float, optional) – The parameter for Dirichlet distribution simulation, only for dir simulation.

  • min_size (int, optional) – The minimal number of samples in each client, only for dir simulation.

  • class_per_client (int, optional) – The number of classes in each client, only for non-iid by class simulation.

loader(batch_size, client_id=None, shuffle=True, seed=0, transform=None, drop_last=False)[source]

Get dataset loader.

Parameters
  • batch_size (int) – The batch size.

  • client_id (str, optional) – The id of client.

  • shuffle (bool, optional) – Whether to shuffle before batching.

  • seed (int, optional) – The shuffle seed.

  • transform (torchvision.transforms.transforms.Compose, optional) – Data transformation.

  • drop_last (bool, optional) – Whether to drop the last batch if its size is smaller than batch size.

Returns

The data loader to load data.

Return type

torch.utils.data.DataLoader

size(cid=None)[source]

Get dataset size.

Parameters

cid (str) – client id.

property users

Get client ids of the federated dataset.

class easyfl.datasets.FederatedTorchDataset(data, users)[source]

Wrapper over PyTorch dataset.

Parameters

data (dict) – A dictionary of client datasets, format {“client_id”: loader1, “client_id2”: loader2}.

loader(batch_size, client_id=None, shuffle=True, seed=0, num_workers=2, transform=None)[source]

Get data loader.

Parameters
  • batch_size (int) – The batch size of the data loader.

  • shuffle (bool) – Whether shuffle the data in the loader.

size(cid=None)[source]

Get dataset size.

Parameters

cid (str) – client id.

property users

Get client ids of the federated dataset.

class easyfl.datasets.Femnist(root, fraction, split_type, user, iid_user_fraction=0.1, train_test_split=0.9, minsample=10, num_class=62, num_of_client=100, class_per_client=2, setting_folder=None, seed=- 1, **kwargs)[source]
FEMNIST dataset implementation. It gets FEMNIST dataset according to configurations.

It stores the processed datasets locally.

base_folder

The base folder path of the datasets folder.

Type

str

class_url

The url to get the by_class split FEMNIST.

Type

str

write_url

The url to get the by_write split FEMNIST.

Type

str

class easyfl.datasets.Shakespeare(root, fraction, split_type, user, iid_user_fraction=0.1, train_test_split=0.9, minsample=10, num_class=80, num_of_client=100, class_per_client=2, setting_folder=None, seed=- 1, **kwargs)[source]

Shakespeare dataset implementation. It gets Shakespeare dataset according to configurations.

base_folder

The base folder path of the datasets folder.

Type

str

raw_data_url

The url to get the by_class split shakespeare.

Type

str

write_url

The url to get the by_write split shakespeare.

Type

str

easyfl.datasets.construct_datasets(root, dataset_name, num_of_clients, split_type, min_size, class_per_client, data_amount, iid_fraction, user, train_test_split, quantity_weights, alpha)[source]

Construct and load provided federated learning datasets.

Parameters
  • root (str) – The root directory where datasets stored.

  • dataset_name (str) – The name of the dataset. It currently supports: femnist, shakespeare, cifar10, and cifar100. Among them, femnist and shakespeare are adopted from LEAF benchmark.

  • num_of_clients (int) – The targeted number of clients to construct.

  • split_type (str) – The type of statistical simulation, options: iid, dir, and class. iid means independent and identically distributed data. niid means non-independent and identically distributed data for Femnist and Shakespeare. dir means using Dirichlet process to simulate non-iid data, for CIFAR-10 and CIFAR-100 datasets. class means partitioning the dataset by label classes, for datasets like CIFAR-10, CIFAR-100.

  • min_size (int) – The minimal number of samples in each client. It is applicable for LEAF datasets and dir simulation of CIFAR-10 and CIFAR-100.

  • class_per_client (int) – The number of classes in each client. Only applicable when the split_type is ‘class’.

  • data_amount (float) – The fraction of data sampled for LEAF datasets. e.g., 10% means that only 10% of total dataset size are used.

  • iid_fraction (float) – The fraction of the number of clients used when the split_type is ‘iid’.

  • user (bool) – A flag to indicate whether partition users of the dataset into train-test groups. Only applicable to LEAF datasets. True means partitioning users of the dataset into train-test groups. False means partitioning each users’ samples into train-test groups.

  • train_test_split (float) – The fraction of data for training; the rest are for testing. e.g., 0.9 means 90% of data are used for training and 10% are used for testing.

  • quantity_weights (list[float]) – The targeted distribution of quantities to simulate data quantity heterogeneity. The values should sum up to 1. e.g., [0.1, 0.2, 0.7]. The num_of_clients should be divisible by len(weights). None means clients are simulated with the same data quantity.

  • alpha (float) – The parameter for Dirichlet distribution simulation, applicable only when split_type is dir.

Returns

Training dataset. FederatedDataset: Testing dataset.

Return type

FederatedDataset

easyfl.datasets.data_simulation(data_x, data_y, num_of_clients, data_distribution, weights=None, alpha=0.5, min_size=10, class_per_client=1, stack_x=True)[source]

Simulate federated learning datasets by partitioning a data into multiple clients using different strategies.

Parameters
  • data_x (list[Object]) – A list of data.

  • data_y (list[Object]) – A list of dataset labels.

  • num_of_clients (int) – The number of clients to partition to.

  • data_distribution (str) – The ways to partition the dataset, options: iid: Partition dataset into multiple clients with equal quantity (difference is less than 1) randomly. dir: partition dataset into multiple clients following the Dirichlet process. class: partition dataset into multiple clients based on classes.

  • weights (list[float], optional) – list, for simulating data quantity heterogeneity If None, each client are simulated with same data quantity Note: num_of_clients should be divisible by len(weights)

  • weights – The targeted distribution of data quantities. The values should sum up to 1. e.g., [0.1, 0.2, 0.7]. When weights=None, the data quantity of clients only depends on data_distribution.

  • alpha (float, optional) – The parameter for Dirichlet process simulation. It is only applicable when data_distribution is dir.

  • min_size (int, optional) – The minimum number of data size of a client. It is only applicable when data_distribution is dir.

  • class_per_client (int) – The number of classes in each client. It is only applicable when data_distribution is class.

  • stack_x (bool, optional) – A flag to indicate whether using np.vstack or append to construct dataset. It is only applicable when data_distribution is class.

Raises

ValueError – When the simulation method data_distribution is not supported.

Returns

A list of client ids. dict: The partitioned data, key is client id, value is the client data. e.g., {‘client_1’: {‘x’: [data_x], ‘y’: [data_y]}}.

Return type

list[str]

easyfl.datasets.equal_division(num_groups, data_x, data_y=None)[source]

Partition data into multiple clients with equal quantity.

Parameters
  • num_groups (int) – THe number of groups to partition to.

  • data_x (list[Object]) – A list of elements to be divided.

  • data_y (list[Object], optional) – A list of data labels to be divided together with the data.

Returns

A list where each element is a list of data of a group/client. list[list]: A list where each element is a list of data label of a group/client.

Return type

list[list]

Example

>>> equal_division(3, list[range(9)])
>>> ([[0,4,2],[3,1,7],[6,5,8]], [])
easyfl.datasets.iid(data_x, data_y, num_of_clients, x_dtype, y_dtype)[source]

Partition dataset into multiple clients with equal data quantity (difference is less than 1) randomly.

Parameters
  • data_x (list[Object]) – A list of data.

  • data_y (list[Object]) – A list of dataset labels.

  • num_of_clients (int) – The number of clients to partition to.

  • x_dtype (numpy.dtype) – The type of data.

  • y_dtype (numpy.dtype) – The type of data label.

Returns

A list of client ids. dict: The partitioned data, key is client id, value is the client data. e.g., {‘client_1’: {‘x’: [data_x], ‘y’: [data_y]}}.

Return type

list[str]

easyfl.datasets.non_iid_class(data_x, data_y, class_per_client, num_of_clients, x_dtype, y_dtype, stack_x=True)[source]

Partition dataset into multiple clients based on label classes. Each client contains [1, n] classes, where n is the number of classes of a dataset.

Note: Each class is divided into ceil(class_per_client * num_of_clients / num_class) parts

and each client chooses class_per_client parts from each class to construct its dataset.

Parameters
  • data_x (list[Object]) – A list of data.

  • data_y (list[Object]) – A list of dataset labels.

  • class_per_client (int) – The number of classes in each client.

  • num_of_clients (int) – The number of clients to partition to.

  • x_dtype (numpy.dtype) – The type of data.

  • y_dtype (numpy.dtype) – The type of data label.

  • stack_x (bool, optional) – A flag to indicate whether using np.vstack or append to construct dataset.

Returns

A list of client ids. dict: The partitioned data, key is client id, value is the client data. e.g., {‘client_1’: {‘x’: [data_x], ‘y’: [data_y]}}.

Return type

list[str]

easyfl.datasets.non_iid_dirichlet(data_x, data_y, num_of_clients, alpha, min_size, x_dtype, y_dtype)[source]

Partition dataset into multiple clients following the Dirichlet process.

Parameters
  • data_x (list[Object]) – A list of data.

  • data_y (list[Object]) – A list of dataset labels.

  • num_of_clients (int) – The number of clients to partition to.

  • alpha (float) – The parameter for Dirichlet process simulation.

  • min_size (int) – The minimum number of data size of a client.

  • x_dtype (numpy.dtype) – The type of data.

  • y_dtype (numpy.dtype) – The type of data label.

Returns

A list of client ids. dict: The partitioned data, key is client id, value is the client data. e.g., {‘client_1’: {‘x’: [data_x], ‘y’: [data_y]}}.

Return type

list[str]

easyfl.datasets.quantity_hetero(weights, data_x, data_y=None)[source]

Partition data into multiple clients with different quantities. The number of groups is the same as the number of elements of weights. The quantity of each group depends on the values of weights.

Parameters
  • weights (list[float]) – The targeted distribution of data quantities. The values should sum up to 1. e.g., [0.1, 0.2, 0.7].

  • data_x (list[Object]) – A list of elements to be divided.

  • data_y (list[Object], optional) – A list of data labels to be divided together with the data.

Returns

A list where each element is a list of data of a group/client. list[list]: A list where each element is a list of data label of a group/client.

Return type

list[list]

Example

>>> quantity_hetero([0.1, 0.2, 0.7], list(range(0, 10)))
>>> ([[4], [8, 9], [6, 0, 1, 7, 3, 2, 5]], [])

easyfl.models

easyfl.communication

easyfl.communication.init_stub(typ, address)[source]

Initialize gRPC stub.

Parameters
  • typ (str) – Type of service, option: client, server, tracking

  • address (str) – Address of the gRPC service.

Returns

stub of the gRPC service.

Return type

(ClientServiceStub`|:obj:`ServerServiceStub`|:obj:`TrackingServiceStub)

easyfl.communication.start_service(typ, service, port)[source]

Start gRPC service. :param typ: Type of service, option: client, server, tracking. :type typ: str :param service: gRPC service to start. :type service: ClientService`|:obj:`ServerService`|:obj:`TrackingService :param port: The port of the service. :type port: int

easyfl.registry

class easyfl.registry.EtcdClient(name, addrs, base_dir, use_mock_etcd=False)[source]

Etcd client to connect and communicate with etcd service. Etcd is the serves as the registry for remote training. Clients register themselves in etcd and server queries etcd to get client addresses.

Parameters
  • name (str) – The name of etcd.

  • addrs (str) – Etcd addresses, format: “<ip>:<port>,<ip>:<port>”.

  • base_dir (str) – The prefix of all etcd requests, default to “backends”.

  • use_mock_etcd (bool) – Whether use mocked etcd for testing.

get_clients(prefix)[source]

Retrieve client addresses from etcd using prefix.

Parameters

prefix (str) – the prefix of clients addresses; default is the docker image name “easyfl-client”

Returns

A list of clients.

Return type

list[VirtualClient]

easyfl.registry.get_clients(source, etcd_addresses=None)[source]

Get clients from registry.

Parameters
  • source (str) – Registry source, options: manual, etcd, kubernetes.

  • etcd_addresses (str, optional) – The addresses of etcd service.

Returns

A list of clients with addresses.

Return type

list[VirtualClient]

Read the Docs v: stable
Versions
latest
stable
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.