easyfl¶
- easyfl.init(conf=None, init_all=True)[source]¶
Initialize EasyFL.
- Parameters
conf (dict, optional) – Configurations.
init_all (bool, optional) – Whether initialize dataset, model, server, and client other than configuration.
- easyfl.init_dataset()[source]¶
Initialize dataset, either using registered dataset or out-of-the-box datasets set in config.
- easyfl.init_model()[source]¶
Initialize model, either using registered model or out-of–the-box model set in config.
- Returns
Model used in federated learning.
- Return type
nn.Module
- easyfl.load_config(file, conf=None)[source]¶
Load and merge configuration from file and input
- Parameters
file (str) – filename of the configuration.
conf (dict) – Configurations.
- Returns
Internal configurations managed by OmegaConf.
- Return type
omegaconf.dictconfig.DictConfig
- easyfl.register_client(client)[source]¶
Register federated learning client.
- Parameters
client (
BaseClient
) – Customized federated learning client.
- easyfl.register_dataset(train_data, test_data, val_data=None)[source]¶
Register datasets for federated learning training.
- Parameters
train_data (
FederatedDataset
) – Training dataset.test_data (
FederatedDataset
) – Testing dataset.val_data (
FederatedDataset
) – Validation dataset.
- easyfl.register_model(model)[source]¶
Register model for federated learning training.
- Parameters
model (nn.Module) – PyTorch model, both class and instance are acceptable.
- easyfl.register_server(server)[source]¶
Register federated learning server.
- Parameters
server (
BaseServer
) – Customized federated learning server.
- easyfl.start_client(args=None)[source]¶
Start federated learning client service for remote training.
- Parameters
args (argparse.Namespace) – Configurations passed in as arguments.
- easyfl.start_remote_client(conf=None, train_data=None, test_data=None, model=None, client=None)[source]¶
Start a remote client.
- Parameters
conf (dict) – Configurations. optional, Use the configuration loaded from file if not provided. It overwrites the configurations from file.
train_data (
FederatedDataset
) – Training dataset.test_data (
FederatedDataset
) – Testing dataset.model (nn.Module) – Model used in client training.
client (
BaseClient
) – Customized federated learning client class.
- easyfl.start_remote_server(conf=None, test_data=None, model=None, server=None)[source]¶
Start a remote server.
- Parameters
conf (dict) – Configurations. optional, Use the configuration loaded from file if not provided. It overwrites the configurations from file.
test_data (
FederatedDataset
) – Test dataset for centralized testing on server.model (nn.Module) – Model used in client training.
server (
BaseServer
) – Customized federated learning server class.
easyfl.server¶
- class easyfl.server.BaseServer(conf, test_data=None, val_data=None, is_remote=False, local_port=22999)[source]¶
Default implementation of federated learning server.
- Parameters
conf (omegaconf.dictconfig.DictConfig) – Configurations of EasyFL.
test_data (
FederatedDataset
) – Test dataset for centralized testing in server, optional.val_data (
FederatedDataset
) – Validation dataset for centralized validation in server, optional.is_remote (bool) – A flag to indicate whether start remote training.
local_port (int) – The port of remote server service.
Override the class and functions to implement customized server.
Example
>>> from easyfl.server import BaseServer >>> class CustomizedServer(BaseServer): >>> def __init__(self, conf, test_data=None, val_data=None, is_remote=False, local_port=22999): >>> super(CustomizedServer, self).__init__(conf, test_data, val_data, is_remote, local_port) >>> pass # more initialization of attributes. >>> >>> def aggregation(self): >>> # Implement customized aggregation method, which overwrites the default aggregation method. >>> pass
- aggregate(models, weights)[source]¶
Aggregate models uploaded from clients via federated averaging.
- Parameters
models (list[nn.Module]) – List of models.
weights (list[float]) – List of weights, corresponding to each model. Weights are dataset size of clients by default.
- Returns
nn.Module: Aggregated model.
- aggregation()[source]¶
Aggregate training updates from clients. Server aggregates trained models from clients via federated averaging.
- aggregation_test()[source]¶
Aggregate testing results from clients.
- Returns
Test metrics, format in {“test_loss”: value, “test_accuracy”: value}
- Return type
dict
- distribution_to_test_remotely()[source]¶
Distribute testing requests to remote clients through multiple threads. The main thread waits for signal to proceed. The signal can be triggered via notification, as below example.
- Example to trigger signal:
>>> with self.condition(): >>> self.notify_all()
- distribution_to_train_locally()[source]¶
Conduct training sequentially for selected clients in the group.
- distribution_to_train_remotely()[source]¶
Distribute training requests to remote clients through multiple threads. The main thread waits for signal to proceed. The signal can be triggered via notification, as below example.
- Example to trigger signal:
>>> with self.condition(): >>> self.notify_all()
- gather_client_train_metrics()[source]¶
Gather client train metrics from other ranks for distributed training, when testing all clients (test_all). When testing all clients, the trained metrics may be override by the test metrics because clients may be placed in different GPUs in training and testing, leading to losses of train metrics. So we gather train metrics and set them in test metrics. TODO: gather is not progressing. Need fix.
- get_client_uploads()[source]¶
Get client uploaded contents.
- Returns
A dictionary that contains client uploaded contents.
- Return type
dict
- get_test_clients()[source]¶
Get clients to run testing.
- Returns
Clients to test.
- Return type
(list[
BaseClient
]|list[str])
- grouping_for_distributed()[source]¶
Divide the selected clients into groups for distributed training. Each group of clients is assigned to conduct training in one GPU. The number of groups = the number of gpus.
Not in distributed training, selected clients are in the same group. In distributed, selected clients are grouped with different strategies: greedy and random.
- init_etcd(addresses)[source]¶
Initialize etcd as the registry for client registration.
- Parameters
addresses (str) – The etcd addresses split by “,”
- is_primary_server()[source]¶
Check whether the current process is the primary server. In standalone or remote training, the server is primary. In distributed training, the server on rank0 is primary.
- Returns
A flag to indicate whether current process is the primary server.
- Return type
bool
- is_training()[source]¶
Check whether the server is in training or has stopped training.
- Returns
A flag to indicate whether server is in training.
- Return type
bool
- print_(content)[source]¶
print only the server is primary server.
- Parameters
content (str) – The content to log.
- profile_training_speed()[source]¶
Manage profiling of client training speeds for distributed training optimization.
- selection(clients, clients_per_round)[source]¶
Select a fraction of total clients for training. Two selection strategies are implemented: 1. random selection; 2. select the first K clients.
- Parameters
clients (list[
BaseClient
]|list[str]) – Available clients.clients_per_round (int) – Number of clients to participate in training each round.
- Returns
The selected clients.
- Return type
(list[
BaseClient
]|list[str])
- set_client_uploads(key, value)[source]¶
A general function to set uploaded content from clients.
- Parameters
key (str) – Dictionary key.
value – Uploaded content.
- set_client_uploads_test(accuracies, losses, test_sizes, metrics=None)[source]¶
Set testing results uploaded from clients.
- Parameters
accuracies (list[float]) – Testing accuracies of clients.
losses (list[float]) – Testing losses of clients.
test_sizes (list[float]) – Test dataset sizes of clients.
metrics (dict) – Client testing metrics.
- set_client_uploads_train(models, weights, metrics=None)[source]¶
Set training updates uploaded from clients.
- Parameters
models (dict) – A collection of models.
weights (dict) – A collection of weights.
metrics (dict) – Client training metrics.
- set_model(model, load_dict=False)[source]¶
Update the universal model in the server.
- Parameters
model (nn.Module) – New model.
load_dict (bool) – A flag to indicate whether load state dict or copy the model.
- should_stop()[source]¶
Check whether should stop training. Stops the training under two conditions: 1. Reach max number of training rounds 2. TODO: Accuracy higher than certain amount.
- Returns
A flag to indicate whether should stop training.
- Return type
bool
- start(model, clients)[source]¶
Start federated learning process, including training and testing.
- Parameters
model (nn.Module) – The model to train.
clients (list[
BaseClient
]|list[str]) – Available clients. Clients are actually client grpc addresses when in remote training.
- start_remote_training(model, clients)[source]¶
Start federated learning in the remote training mode. Server establishes gPRC connection with clients that are not connected first before training.
- Parameters
model (nn.Module) – The model to train.
clients (list[str]) – Client addresses.
- test_in_client()[source]¶
Conduct testing in clients. Currently, it supports testing on the selected clients for training. TODO: Add optionals to select clients for testing.
- Returns
Test metrics, {“test_loss”: value, “test_accuracy”: value, “test_time”: value}.
- Return type
dict
- test_in_server(device='cpu')[source]¶
Conduct testing in the server.
- Parameters
device (str) – The hardware device to conduct testing, either cpu or cuda devices.
- Returns
Test metrics, {“test_loss”: value, “test_accuracy”: value, “test_time”: value}.
- Return type
dict
- track(metric_name, value)[source]¶
Track a metric.
- Parameters
metric_name (str) – Name of the metric of a round.
value (str|int|float|bool|dict|list) – Value of the metric.
- track_communication_cost()[source]¶
Track communication cost among server and clients. Communication cost occurs in training and testing with downlink and uplink costs.
- class easyfl.server.ServerService(server)[source]¶
“Remote gRPC server service.
- Parameters
server (
BaseServer
) – Federated learning server instance.
- easyfl.server.federated_averaging(models, weights)[source]¶
Compute weighted average of model parameters and persistent buffers. Using state_dict of model, including persistent buffers like BN stats.
- Parameters
models (list[nn.Module]) – List of models to average.
weights (list[float]) – List of weights, corresponding to each model. Weights are dataset size of clients by default.
- Returns
nn.Module: Weighted averaged model.
- easyfl.server.federated_averaging_only_params(models, weights)[source]¶
Compute weighted average of model parameters. Use model parameters only.
- Parameters
models (list[nn.Module]) – List of models to average.
weights (list[float]) – List of weights, corresponding to each model. Weights are dataset size of clients by default.
- Returns
nn.Module: Weighted averaged model.
- easyfl.server.weighted_sum(models, weights)[source]¶
Compute weighted sum of model parameters and persistent buffers. Using state_dict of model, including persistent buffers like BN stats.
- Parameters
models (list[nn.Module]) – List of models to average.
weights (list[float]) – List of weights, corresponding to each model. Weights are dataset size of clients by default.
- Returns
nn.Module: Weighted averaged model. float: Sum of weights.
- easyfl.server.weighted_sum_only_params(models, weights)[source]¶
Compute weighted sum of model parameters. Use model parameters only.
- Parameters
models (list[nn.Module]) – List of models to average.
weights (list[float]) – List of weights, corresponding to each model. Weights are dataset size of clients by default.
- Returns
nn.Module: Weighted averaged model. float: Sum of weights.
easyfl.client¶
- class easyfl.client.BaseClient(cid, conf, train_data, test_data, device, sleep_time=0, is_remote=False, local_port=23000, server_addr='localhost:22999', tracker_addr='localhost:12666')[source]¶
Default implementation of federated learning client.
- Parameters
cid (str) – Client id.
conf (omegaconf.dictconfig.DictConfig) – Client configurations.
train_data (
FederatedDataset
) – Training dataset.test_data (
FederatedDataset
) – Test dataset.device (str) – Hardware device for training, cpu or cuda devices.
sleep_time (float) – Duration of on hold after training to simulate stragglers.
is_remote (bool) – Whether start remote training.
local_port (int) – Port of remote client service.
server_addr (str) – Remote server service grpc address.
tracker_addr (str) – Remote tracking service grpc address.
Override the class and functions to implement customized client.
Example
>>> from easyfl.client import BaseClient >>> class CustomizedClient(BaseClient): >>> def __init__(self, cid, conf, train_data, test_data, device, **kwargs): >>> super(CustomizedClient, self).__init__(cid, conf, train_data, test_data, device, **kwargs) >>> pass # more initialization of attributes. >>> >>> def train(self, conf, device=CPU): >>> # Implement customized client training method, which overwrites the default training method. >>> pass
- compression()[source]¶
Compress the client local model after training and before uploading to the server.
- construct_upload_request()[source]¶
Construct client upload request for training updates and testing results.
- Returns
The upload request defined in protobuf to unify local and remote operations.
- Return type
UploadRequest
- decompression()[source]¶
Decompressed model. It can be further implemented when the model is compressed in the server.
- download(model)[source]¶
Download model from the server.
- Parameters
model (nn.Module) – Global model distributed from the server.
- load_loader(conf)[source]¶
Load the training data loader.
- Parameters
conf (omegaconf.dictconfig.DictConfig) – Client configurations.
- Returns
Data loader.
- Return type
torch.utils.data.DataLoader
- operate(model, conf, index, is_train=True)[source]¶
A wrapper over operations (training/testing) on clients.
- Parameters
model (nn.Module) – Model for operations.
conf (omegaconf.dictconfig.DictConfig) – Client configurations.
index (int) – Client index in the client list, for retrieving data. TODO: improvement.
is_train (bool) – The flag to indicate whether the operation is training, otherwise testing.
- run_test(model, conf)[source]¶
Conduct testing on clients.
- Parameters
model (nn.Module) – Model to test.
conf (omegaconf.dictconfig.DictConfig) – Client configurations.
- Returns
Testing contents. Unify the interface for both local and remote operations.
- Return type
UploadRequest
- run_train(model, conf)[source]¶
Conduct training on clients.
- Parameters
model (nn.Module) – Model to train.
conf (omegaconf.dictconfig.DictConfig) – Client configurations.
- Returns
Training contents. Unify the interface for both local and remote operations.
- Return type
UploadRequest
- test(conf, device='cpu')[source]¶
Execute client testing.
- Parameters
conf (omegaconf.dictconfig.DictConfig) – Client configurations.
device (str) – Hardware device for training, cpu or cuda devices.
- track(metric_name, value)[source]¶
Track a metric.
- Parameters
metric_name (str) – The name of the metric.
value (str|int|float|bool|dict|list) – The value of the metric.
- train(conf, device='cpu')[source]¶
Execute client training.
- Parameters
conf (omegaconf.dictconfig.DictConfig) – Client configurations.
device (str) – Hardware device for training, cpu or cuda devices.
- class easyfl.client.ClientService(client)[source]¶
“Remote gRPC client service.
- Parameters
client (
BaseClient
) – Federated learning client instance.
easyfl.distributed¶
- easyfl.distributed.dist_init(backend, init_method, world_size, rank, local_rank)[source]¶
Initialize PyTorch distribute.
- Parameters
backend (str or Backend) – Distributed backend to use, e.g., nccl, gloo.
init_method (str, optional) – URL specifying how to initialize the process group.
world_size (int, optional) – Number of processes participating in the job.
rank (local) – Rank of the current process.
rank – Local rank of the current process.
- Returns
Rank of current process. int: Total number of processes.
- Return type
int
- easyfl.distributed.gather_value(value, world_size, device)[source]¶
Gather the value from devices to a list.
- Parameters
value (float|int) – The value to gather.
world_size (int) – The number of processes.
device (str) – The device where the value is on, either cpu or cuda devices.
- Returns
A list of gathered values.
- Return type
list[torch.Tensor]
- easyfl.distributed.get_device(gpu, world_size, local_rank)[source]¶
Obtain the device by checking the number of GPUs and distributed settings.
- Parameters
gpu (int) – The number of requested gpu.
world_size (int) – The number of processes.
local_rank (int) – The local rank of the current process.
- Returns
Device to be used in PyTorch like tensor.to(device).
- Return type
str
- easyfl.distributed.get_ip(node_list)[source]¶
Get the ip address of nodes.
- Parameters
node_list (str) – Name of the nodes.
- Returns
The first node in the nodes.
- Return type
str
- easyfl.distributed.grouping(clients, world_size, default_time=10, strategy='random', seed=1)[source]¶
Divide clients into groups with different strategies.
- Parameters
clients (list[
BaseClient
]) – A list of clients.world_size (int) – The number of processes, it represent the number of groups here.
default_time (float, optional) – The default training time for not profiled clients.
strategy (str, optional) – Strategy of grouping, options: random, greedy, worst. When no strategy is applied, each client is a group.
seed (int, optional) – Random seed.
- Returns
Groups of clients, each group is a sub-list.
- Return type
list[list[
BaseClient
]]
- easyfl.distributed.reduce_models(model, sample_sum)[source]¶
Aggregate models across devices and update the model with the new aggregated model parameters.
- Parameters
model (nn.Module) – The model in a device to aggregate.
sample_sum (int) – Sum of the total dataset sizes of clients in a device.
- easyfl.distributed.reduce_models_only_params(model, sample_sum)[source]¶
Aggregate models across devices and update the model with the new aggregated model parameters, excluding the persistent buffers like BN stats.
- Parameters
model (nn.Module) – The model in a device to aggregate.
sample_sum (torch.Tensor) – Sum of the total dataset sizes of clients in a device.
- easyfl.distributed.reduce_value(value, device)[source]¶
Calculate the sum of the value across devices.
- Parameters
value (float/int) – Value to sum.
device (str) – The device where the value is on, either cpu or cuda devices.
- Returns
Sum of the values.
- Return type
torch.Tensor
- easyfl.distributed.reduce_values(values, device)[source]¶
Calculate the average of values across devices.
- Parameters
values (list[float|int]) – Values to average.
device (str) – The device where the value is on, either cpu or cuda devices.
- Returns
The average of the values across devices.
- Return type
torch.Tensor
- easyfl.distributed.reduce_weighted_values(values, weights, device)[source]¶
Calculate the weighted average of values across devices.
- Parameters
values (list[float|int]) – Values to average.
weights (list[float|int]) – The weights to calculate weighted average.
device (str) – The device where the value is on, either cpu or cuda devices.
- Returns
The average of values across devices.
- Return type
torch.Tensor
- easyfl.distributed.setup(port=23344)[source]¶
Setup distributed settings of slurm.
- Parameters
port (int, optional) – The port of the primary server. It respectively auto-increments by 1 when the port is in-use.
- Returns
The rank of current process. int: The local rank of current process. int: Total number of processes. str: The address of the distributed init method.
- Return type
int
easyfl.dataset¶
- class easyfl.datasets.BaseDataset(root, dataset_name, fraction, split_type, user, iid_user_fraction, train_test_split, minsample, num_class, num_of_client, class_per_client, setting_folder, seed=- 1, **kwargs)[source]¶
The internal base dataset implementation.
- Parameters
root (str) – The root directory where datasets stored.
dataset_name (str) – The name of the dataset.
fraction (float) – The fraction of the data chosen from the raw data to use.
num_of_clients (int) – The targeted number of clients to construct.
split_type (str) – The type of statistical simulation, options: iid, dir, and class. iid means independent and identically distributed data. niid means non-independent and identically distributed data for Femnist and Shakespeare. dir means using Dirichlet process to simulate non-iid data, for CIFAR-10 and CIFAR-100 datasets. class means partitioning the dataset by label classes, for datasets like CIFAR-10, CIFAR-100.
minsample (int) – The minimal number of samples in each client. It is applicable for LEAF datasets and dir simulation of CIFAR-10 and CIFAR-100.
class_per_client (int) – The number of classes in each client. Only applicable when the split_type is ‘class’.
iid_user_fraction (float) – The fraction of the number of clients used when the split_type is ‘iid’.
user (bool) – A flag to indicate whether partition users of the dataset into train-test groups. Only applicable to LEAF datasets. True means partitioning users of the dataset into train-test groups. False means partitioning each users’ samples into train-test groups.
train_test_split (float) – The fraction of data for training; the rest are for testing. e.g., 0.9 means 90% of data are used for training and 10% are used for testing.
num_class – The number of classes in this dataset.
seed – Random seed.
- class easyfl.datasets.Cifar10(root, fraction, split_type, user, iid_user_fraction=0.1, train_test_split=0.9, minsample=10, num_class=80, num_of_client=100, class_per_client=2, setting_folder=None, seed=- 1, weights=None, alpha=0.5)[source]¶
- class easyfl.datasets.Cifar100(root, fraction, split_type, user, iid_user_fraction=0.1, train_test_split=0.9, minsample=10, num_class=80, num_of_client=100, class_per_client=2, setting_folder=None, seed=- 1, weights=None, alpha=0.5)[source]¶
- class easyfl.datasets.FederatedDataset[source]¶
The abstract class of federated dataset for EasyFL.
- abstract loader(batch_size, shuffle=True)[source]¶
Get data loader.
- Parameters
batch_size (int) – The batch size of the data loader.
shuffle (bool) – Whether shuffle the data in the loader.
- property users¶
Get client ids of the federated dataset.
- class easyfl.datasets.FederatedImageDataset(root, simulated, do_simulate=True, extensions=('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif', '.tiff', '.webp'), is_valid_file=None, transform=None, target_transform=None, client_ids='default', num_of_clients=10, simulation_method='iid', weights=None, alpha=0.5, min_size=10, class_per_client=1)[source]¶
Federated image dataset, data of clients are in format of image folder.
- Parameters
root (str|list[str]) – The root directory or directories of image data folder. If the dataset is simulated to multiple clients, the root is a list of directories. Otherwise, it is the directory of an image data folder.
simulated (bool) – Whether the dataset is simulated to federated learning settings.
do_simulate (bool, optional) – Whether conduct simulation. It is only effective if it is not simulated.
extensions (list[str], optional) – A list of allowed image extensions. Only one of extensions and is_valid_file can be specified.
is_valid_file (function, optional) – A function that takes path of an Image file and check if it is valid. Only one of extensions and is_valid_file can be specified.
transform (torchvision.transforms.transforms.Compose, optional) – Transformation for data.
target_transform (torchvision.transforms.transforms.Compose, optional) – Transformation for data labels.
num_of_clients (int, optional) – number of clients for simulation. Only need if doing simulation.
simulation_method (optional) – split method. Only need if doing simulation.
weights (list[float], optional) – The targeted distribution of quantities to simulate quantity heterogeneity. The values should sum up to 1. e.g., [0.1, 0.2, 0.7]. The num_of_clients should be divisible by len(weights). None means clients are simulated with the same data quantity.
alpha (float, optional) – The parameter for Dirichlet distribution simulation, only for dir simulation.
min_size (int, optional) – The minimal number of samples in each client, only for dir simulation.
class_per_client (int, optional) – The number of classes in each client, only for non-iid by class simulation.
client_ids (list[str], optional) – A list of client ids. Each client id matches with an element in roots. The client ids are [“f0000001”, “f00000002”, …] if not specified.
- loader(batch_size, client_id=None, shuffle=True, seed=0, num_workers=2, transform=None)[source]¶
Get dataset loader.
- Parameters
batch_size (int) – The batch size.
client_id (str, optional) – The id of client.
shuffle (bool, optional) – Whether to shuffle before batching.
seed (int, optional) – The shuffle seed.
transform (torchvision.transforms.transforms.Compose, optional) – Data transformation.
num_workers (int, optional) – The number of workers for dataset loader.
- Returns
The data loader to load data.
- Return type
torch.utils.data.DataLoader
- property users¶
Get client ids of the federated dataset.
- class easyfl.datasets.FederatedTensorDataset(data, transform=None, target_transform=None, process_x=<function default_process_x>, process_y=<function default_process_x>, simulated=False, do_simulate=True, num_of_clients=10, simulation_method='iid', weights=None, alpha=0.5, min_size=10, class_per_client=1)[source]¶
Federated tensor dataset, data of clients are in format of tensor or list.
- Parameters
data (dict) – A dictionary of data, e.g., {“id1”: {“x”: [[], [], …], “y”: […]]}}. If simulation is not done previously, it is in format of {‘x’:[[],[], …], ‘y’: […]}.
transform (torchvision.transforms.transforms.Compose, optional) – Transformation for data.
target_transform (torchvision.transforms.transforms.Compose, optional) – Transformation for data labels.
process_x (function, optional) – A function to preprocess training data.
process_y (function, optional) – A function to preprocess testing data.
simulated (bool, optional) – Whether the dataset is simulated to federated learning settings.
do_simulate (bool, optional) – Whether conduct simulation. It is only effective if it is not simulated.
num_of_clients (int, optional) – number of clients for simulation. Only need if doing simulation.
simulation_method (optional) – split method. Only need if doing simulation.
weights (list[float], optional) – The targeted distribution of quantities to simulate quantity heterogeneity. The values should sum up to 1. e.g., [0.1, 0.2, 0.7]. The num_of_clients should be divisible by len(weights). None means clients are simulated with the same data quantity.
alpha (float, optional) – The parameter for Dirichlet distribution simulation, only for dir simulation.
min_size (int, optional) – The minimal number of samples in each client, only for dir simulation.
class_per_client (int, optional) – The number of classes in each client, only for non-iid by class simulation.
- loader(batch_size, client_id=None, shuffle=True, seed=0, transform=None, drop_last=False)[source]¶
Get dataset loader.
- Parameters
batch_size (int) – The batch size.
client_id (str, optional) – The id of client.
shuffle (bool, optional) – Whether to shuffle before batching.
seed (int, optional) – The shuffle seed.
transform (torchvision.transforms.transforms.Compose, optional) – Data transformation.
drop_last (bool, optional) – Whether to drop the last batch if its size is smaller than batch size.
- Returns
The data loader to load data.
- Return type
torch.utils.data.DataLoader
- property users¶
Get client ids of the federated dataset.
- class easyfl.datasets.FederatedTorchDataset(data, users)[source]¶
Wrapper over PyTorch dataset.
- Parameters
data (dict) – A dictionary of client datasets, format {“client_id”: loader1, “client_id2”: loader2}.
- loader(batch_size, client_id=None, shuffle=True, seed=0, num_workers=2, transform=None)[source]¶
Get data loader.
- Parameters
batch_size (int) – The batch size of the data loader.
shuffle (bool) – Whether shuffle the data in the loader.
- property users¶
Get client ids of the federated dataset.
- class easyfl.datasets.Femnist(root, fraction, split_type, user, iid_user_fraction=0.1, train_test_split=0.9, minsample=10, num_class=62, num_of_client=100, class_per_client=2, setting_folder=None, seed=- 1, **kwargs)[source]¶
- FEMNIST dataset implementation. It gets FEMNIST dataset according to configurations.
It stores the processed datasets locally.
- base_folder¶
The base folder path of the datasets folder.
- Type
str
- class_url¶
The url to get the by_class split FEMNIST.
- Type
str
- write_url¶
The url to get the by_write split FEMNIST.
- Type
str
- class easyfl.datasets.Shakespeare(root, fraction, split_type, user, iid_user_fraction=0.1, train_test_split=0.9, minsample=10, num_class=80, num_of_client=100, class_per_client=2, setting_folder=None, seed=- 1, **kwargs)[source]¶
Shakespeare dataset implementation. It gets Shakespeare dataset according to configurations.
- base_folder¶
The base folder path of the datasets folder.
- Type
str
- raw_data_url¶
The url to get the by_class split shakespeare.
- Type
str
- write_url¶
The url to get the by_write split shakespeare.
- Type
str
- easyfl.datasets.construct_datasets(root, dataset_name, num_of_clients, split_type, min_size, class_per_client, data_amount, iid_fraction, user, train_test_split, quantity_weights, alpha)[source]¶
Construct and load provided federated learning datasets.
- Parameters
root (str) – The root directory where datasets stored.
dataset_name (str) – The name of the dataset. It currently supports: femnist, shakespeare, cifar10, and cifar100. Among them, femnist and shakespeare are adopted from LEAF benchmark.
num_of_clients (int) – The targeted number of clients to construct.
split_type (str) – The type of statistical simulation, options: iid, dir, and class. iid means independent and identically distributed data. niid means non-independent and identically distributed data for Femnist and Shakespeare. dir means using Dirichlet process to simulate non-iid data, for CIFAR-10 and CIFAR-100 datasets. class means partitioning the dataset by label classes, for datasets like CIFAR-10, CIFAR-100.
min_size (int) – The minimal number of samples in each client. It is applicable for LEAF datasets and dir simulation of CIFAR-10 and CIFAR-100.
class_per_client (int) – The number of classes in each client. Only applicable when the split_type is ‘class’.
data_amount (float) – The fraction of data sampled for LEAF datasets. e.g., 10% means that only 10% of total dataset size are used.
iid_fraction (float) – The fraction of the number of clients used when the split_type is ‘iid’.
user (bool) – A flag to indicate whether partition users of the dataset into train-test groups. Only applicable to LEAF datasets. True means partitioning users of the dataset into train-test groups. False means partitioning each users’ samples into train-test groups.
train_test_split (float) – The fraction of data for training; the rest are for testing. e.g., 0.9 means 90% of data are used for training and 10% are used for testing.
quantity_weights (list[float]) – The targeted distribution of quantities to simulate data quantity heterogeneity. The values should sum up to 1. e.g., [0.1, 0.2, 0.7]. The num_of_clients should be divisible by len(weights). None means clients are simulated with the same data quantity.
alpha (float) – The parameter for Dirichlet distribution simulation, applicable only when split_type is dir.
- Returns
Training dataset.
FederatedDataset
: Testing dataset.- Return type
- easyfl.datasets.data_simulation(data_x, data_y, num_of_clients, data_distribution, weights=None, alpha=0.5, min_size=10, class_per_client=1, stack_x=True)[source]¶
Simulate federated learning datasets by partitioning a data into multiple clients using different strategies.
- Parameters
data_x (list[Object]) – A list of data.
data_y (list[Object]) – A list of dataset labels.
num_of_clients (int) – The number of clients to partition to.
data_distribution (str) – The ways to partition the dataset, options: iid: Partition dataset into multiple clients with equal quantity (difference is less than 1) randomly. dir: partition dataset into multiple clients following the Dirichlet process. class: partition dataset into multiple clients based on classes.
weights (list[float], optional) – list, for simulating data quantity heterogeneity If None, each client are simulated with same data quantity Note: num_of_clients should be divisible by len(weights)
weights – The targeted distribution of data quantities. The values should sum up to 1. e.g., [0.1, 0.2, 0.7]. When weights=None, the data quantity of clients only depends on data_distribution.
alpha (float, optional) – The parameter for Dirichlet process simulation. It is only applicable when data_distribution is dir.
min_size (int, optional) – The minimum number of data size of a client. It is only applicable when data_distribution is dir.
class_per_client (int) – The number of classes in each client. It is only applicable when data_distribution is class.
stack_x (bool, optional) – A flag to indicate whether using np.vstack or append to construct dataset. It is only applicable when data_distribution is class.
- Raises
ValueError – When the simulation method data_distribution is not supported.
- Returns
A list of client ids. dict: The partitioned data, key is client id, value is the client data. e.g., {‘client_1’: {‘x’: [data_x], ‘y’: [data_y]}}.
- Return type
list[str]
- easyfl.datasets.equal_division(num_groups, data_x, data_y=None)[source]¶
Partition data into multiple clients with equal quantity.
- Parameters
num_groups (int) – THe number of groups to partition to.
data_x (list[Object]) – A list of elements to be divided.
data_y (list[Object], optional) – A list of data labels to be divided together with the data.
- Returns
A list where each element is a list of data of a group/client. list[list]: A list where each element is a list of data label of a group/client.
- Return type
list[list]
Example
>>> equal_division(3, list[range(9)]) >>> ([[0,4,2],[3,1,7],[6,5,8]], [])
- easyfl.datasets.iid(data_x, data_y, num_of_clients, x_dtype, y_dtype)[source]¶
Partition dataset into multiple clients with equal data quantity (difference is less than 1) randomly.
- Parameters
data_x (list[Object]) – A list of data.
data_y (list[Object]) – A list of dataset labels.
num_of_clients (int) – The number of clients to partition to.
x_dtype (numpy.dtype) – The type of data.
y_dtype (numpy.dtype) – The type of data label.
- Returns
A list of client ids. dict: The partitioned data, key is client id, value is the client data. e.g., {‘client_1’: {‘x’: [data_x], ‘y’: [data_y]}}.
- Return type
list[str]
- easyfl.datasets.non_iid_class(data_x, data_y, class_per_client, num_of_clients, x_dtype, y_dtype, stack_x=True)[source]¶
Partition dataset into multiple clients based on label classes. Each client contains [1, n] classes, where n is the number of classes of a dataset.
- Note: Each class is divided into ceil(class_per_client * num_of_clients / num_class) parts
and each client chooses class_per_client parts from each class to construct its dataset.
- Parameters
data_x (list[Object]) – A list of data.
data_y (list[Object]) – A list of dataset labels.
class_per_client (int) – The number of classes in each client.
num_of_clients (int) – The number of clients to partition to.
x_dtype (numpy.dtype) – The type of data.
y_dtype (numpy.dtype) – The type of data label.
stack_x (bool, optional) – A flag to indicate whether using np.vstack or append to construct dataset.
- Returns
A list of client ids. dict: The partitioned data, key is client id, value is the client data. e.g., {‘client_1’: {‘x’: [data_x], ‘y’: [data_y]}}.
- Return type
list[str]
- easyfl.datasets.non_iid_dirichlet(data_x, data_y, num_of_clients, alpha, min_size, x_dtype, y_dtype)[source]¶
Partition dataset into multiple clients following the Dirichlet process.
- Parameters
data_x (list[Object]) – A list of data.
data_y (list[Object]) – A list of dataset labels.
num_of_clients (int) – The number of clients to partition to.
alpha (float) – The parameter for Dirichlet process simulation.
min_size (int) – The minimum number of data size of a client.
x_dtype (numpy.dtype) – The type of data.
y_dtype (numpy.dtype) – The type of data label.
- Returns
A list of client ids. dict: The partitioned data, key is client id, value is the client data. e.g., {‘client_1’: {‘x’: [data_x], ‘y’: [data_y]}}.
- Return type
list[str]
- easyfl.datasets.quantity_hetero(weights, data_x, data_y=None)[source]¶
Partition data into multiple clients with different quantities. The number of groups is the same as the number of elements of weights. The quantity of each group depends on the values of weights.
- Parameters
weights (list[float]) – The targeted distribution of data quantities. The values should sum up to 1. e.g., [0.1, 0.2, 0.7].
data_x (list[Object]) – A list of elements to be divided.
data_y (list[Object], optional) – A list of data labels to be divided together with the data.
- Returns
A list where each element is a list of data of a group/client. list[list]: A list where each element is a list of data label of a group/client.
- Return type
list[list]
Example
>>> quantity_hetero([0.1, 0.2, 0.7], list(range(0, 10))) >>> ([[4], [8, 9], [6, 0, 1, 7, 3, 2, 5]], [])
easyfl.models¶
easyfl.communication¶
- easyfl.communication.init_stub(typ, address)[source]¶
Initialize gRPC stub.
- Parameters
typ (str) – Type of service, option: client, server, tracking
address (str) – Address of the gRPC service.
- Returns
stub of the gRPC service.
- Return type
(
ClientServiceStub`|:obj:`ServerServiceStub`|:obj:`TrackingServiceStub
)
- easyfl.communication.start_service(typ, service, port)[source]¶
Start gRPC service. :param typ: Type of service, option: client, server, tracking. :type typ: str :param service: gRPC service to start. :type service:
ClientService`|:obj:`ServerService`|:obj:`TrackingService
:param port: The port of the service. :type port: int
easyfl.registry¶
- class easyfl.registry.EtcdClient(name, addrs, base_dir, use_mock_etcd=False)[source]¶
Etcd client to connect and communicate with etcd service. Etcd is the serves as the registry for remote training. Clients register themselves in etcd and server queries etcd to get client addresses.
- Parameters
name (str) – The name of etcd.
addrs (str) – Etcd addresses, format: “<ip>:<port>,<ip>:<port>”.
base_dir (str) – The prefix of all etcd requests, default to “backends”.
use_mock_etcd (bool) – Whether use mocked etcd for testing.
- easyfl.registry.get_clients(source, etcd_addresses=None)[source]¶
Get clients from registry.
- Parameters
source (str) – Registry source, options: manual, etcd, kubernetes.
etcd_addresses (str, optional) – The addresses of etcd service.
- Returns
A list of clients with addresses.
- Return type
list[
VirtualClient
]