easyfl¶
easyfl.server¶
easyfl.client¶
- class easyfl.client.BaseClient(cid, conf, train_data, test_data, device, sleep_time=0, is_remote=False, local_port=23000, server_addr='localhost:22999', tracker_addr='localhost:12666')[source]¶
Default implementation of federated learning client.
- Parameters
cid (str) – Client id.
conf (omegaconf.dictconfig.DictConfig) – Client configurations.
train_data (
FederatedDataset
) – Training dataset.test_data (
FederatedDataset
) – Test dataset.device (str) – Hardware device for training, cpu or cuda devices.
sleep_time (float) – Duration of on hold after training to simulate stragglers.
is_remote (bool) – Whether start remote training.
local_port (int) – Port of remote client service.
server_addr (str) – Remote server service grpc address.
tracker_addr (str) – Remote tracking service grpc address.
Override the class and functions to implement customized client.
Example
>>> from easyfl.client import BaseClient >>> class CustomizedClient(BaseClient): >>> def __init__(self, cid, conf, train_data, test_data, device, **kwargs): >>> super(CustomizedClient, self).__init__(cid, conf, train_data, test_data, device, **kwargs) >>> pass # more initialization of attributes. >>> >>> def train(self, conf, device=CPU): >>> # Implement customized client training method, which overwrites the default training method. >>> pass
- compression()[source]¶
Compress the client local model after training and before uploading to the server.
- construct_upload_request()[source]¶
Construct client upload request for training updates and testing results.
- Returns
The upload request defined in protobuf to unify local and remote operations.
- Return type
UploadRequest
- decompression()[source]¶
Decompressed model. It can be further implemented when the model is compressed in the server.
- download(model)[source]¶
Download model from the server.
- Parameters
model (nn.Module) – Global model distributed from the server.
- load_loader(conf)[source]¶
Load the training data loader.
- Parameters
conf (omegaconf.dictconfig.DictConfig) – Client configurations.
- Returns
Data loader.
- Return type
torch.utils.data.DataLoader
- operate(model, conf, index, is_train=True)[source]¶
A wrapper over operations (training/testing) on clients.
- Parameters
model (nn.Module) – Model for operations.
conf (omegaconf.dictconfig.DictConfig) – Client configurations.
index (int) – Client index in the client list, for retrieving data. TODO: improvement.
is_train (bool) – The flag to indicate whether the operation is training, otherwise testing.
- run_test(model, conf)[source]¶
Conduct testing on clients.
- Parameters
model (nn.Module) – Model to test.
conf (omegaconf.dictconfig.DictConfig) – Client configurations.
- Returns
Testing contents. Unify the interface for both local and remote operations.
- Return type
UploadRequest
- run_train(model, conf)[source]¶
Conduct training on clients.
- Parameters
model (nn.Module) – Model to train.
conf (omegaconf.dictconfig.DictConfig) – Client configurations.
- Returns
Training contents. Unify the interface for both local and remote operations.
- Return type
UploadRequest
- test(conf, device='cpu')[source]¶
Execute client testing.
- Parameters
conf (omegaconf.dictconfig.DictConfig) – Client configurations.
device (str) – Hardware device for training, cpu or cuda devices.
- track(metric_name, value)[source]¶
Track a metric.
- Parameters
metric_name (str) – The name of the metric.
value (str|int|float|bool|dict|list) – The value of the metric.
- train(conf, device='cpu')[source]¶
Execute client training.
- Parameters
conf (omegaconf.dictconfig.DictConfig) – Client configurations.
device (str) – Hardware device for training, cpu or cuda devices.
- class easyfl.client.ClientService(client)[source]¶
“Remote gRPC client service.
- Parameters
client (
BaseClient
) – Federated learning client instance.
easyfl.distributed¶
- easyfl.distributed.dist_init(backend, init_method, world_size, rank, local_rank)[source]¶
Initialize PyTorch distribute.
- Parameters
backend (str or Backend) – Distributed backend to use, e.g., nccl, gloo.
init_method (str, optional) – URL specifying how to initialize the process group.
world_size (int, optional) – Number of processes participating in the job.
rank (local) – Rank of the current process.
rank – Local rank of the current process.
- Returns
Rank of current process. int: Total number of processes.
- Return type
int
- easyfl.distributed.gather_value(value, world_size, device)[source]¶
Gather the value from devices to a list.
- Parameters
value (float|int) – The value to gather.
world_size (int) – The number of processes.
device (str) – The device where the value is on, either cpu or cuda devices.
- Returns
A list of gathered values.
- Return type
list[torch.Tensor]
- easyfl.distributed.get_device(gpu, world_size, local_rank)[source]¶
Obtain the device by checking the number of GPUs and distributed settings.
- Parameters
gpu (int) – The number of requested gpu.
world_size (int) – The number of processes.
local_rank (int) – The local rank of the current process.
- Returns
Device to be used in PyTorch like tensor.to(device).
- Return type
str
- easyfl.distributed.get_ip(node_list)[source]¶
Get the ip address of nodes.
- Parameters
node_list (str) – Name of the nodes.
- Returns
The first node in the nodes.
- Return type
str
- easyfl.distributed.grouping(clients, world_size, default_time=10, strategy='random', seed=1)[source]¶
Divide clients into groups with different strategies.
- Parameters
clients (list[
BaseClient
]) – A list of clients.world_size (int) – The number of processes, it represent the number of groups here.
default_time (float, optional) – The default training time for not profiled clients.
strategy (str, optional) – Strategy of grouping, options: random, greedy, worst. When no strategy is applied, each client is a group.
seed (int, optional) – Random seed.
- Returns
Groups of clients, each group is a sub-list.
- Return type
list[list[
BaseClient
]]
- easyfl.distributed.reduce_models(model, sample_sum)[source]¶
Aggregate models across devices and update the model with the new aggregated model parameters.
- Parameters
model (nn.Module) – The model in a device to aggregate.
sample_sum (int) – Sum of the total dataset sizes of clients in a device.
- easyfl.distributed.reduce_models_only_params(model, sample_sum)[source]¶
Aggregate models across devices and update the model with the new aggregated model parameters, excluding the persistent buffers like BN stats.
- Parameters
model (nn.Module) – The model in a device to aggregate.
sample_sum (torch.Tensor) – Sum of the total dataset sizes of clients in a device.
- easyfl.distributed.reduce_value(value, device)[source]¶
Calculate the sum of the value across devices.
- Parameters
value (float/int) – Value to sum.
device (str) – The device where the value is on, either cpu or cuda devices.
- Returns
Sum of the values.
- Return type
torch.Tensor
- easyfl.distributed.reduce_values(values, device)[source]¶
Calculate the average of values across devices.
- Parameters
values (list[float|int]) – Values to average.
device (str) – The device where the value is on, either cpu or cuda devices.
- Returns
The average of the values across devices.
- Return type
torch.Tensor
- easyfl.distributed.reduce_weighted_values(values, weights, device)[source]¶
Calculate the weighted average of values across devices.
- Parameters
values (list[float|int]) – Values to average.
weights (list[float|int]) – The weights to calculate weighted average.
device (str) – The device where the value is on, either cpu or cuda devices.
- Returns
The average of values across devices.
- Return type
torch.Tensor
- easyfl.distributed.setup(port=23344)[source]¶
Setup distributed settings of slurm.
- Parameters
port (int, optional) – The port of the primary server. It respectively auto-increments by 1 when the port is in-use.
- Returns
The rank of current process. int: The local rank of current process. int: Total number of processes. str: The address of the distributed init method.
- Return type
int
easyfl.dataset¶
- class easyfl.datasets.BaseDataset(root, dataset_name, fraction, split_type, user, iid_user_fraction, train_test_split, minsample, num_class, num_of_client, class_per_client, setting_folder, seed=- 1, **kwargs)[source]¶
The internal base dataset implementation.
- Parameters
root (str) – The root directory where datasets stored.
dataset_name (str) – The name of the dataset.
fraction (float) – The fraction of the data chosen from the raw data to use.
num_of_clients (int) – The targeted number of clients to construct.
split_type (str) – The type of statistical simulation, options: iid, dir, and class. iid means independent and identically distributed data. niid means non-independent and identically distributed data for Femnist and Shakespeare. dir means using Dirichlet process to simulate non-iid data, for CIFAR-10 and CIFAR-100 datasets. class means partitioning the dataset by label classes, for datasets like CIFAR-10, CIFAR-100.
minsample (int) – The minimal number of samples in each client. It is applicable for LEAF datasets and dir simulation of CIFAR-10 and CIFAR-100.
class_per_client (int) – The number of classes in each client. Only applicable when the split_type is ‘class’.
iid_user_fraction (float) – The fraction of the number of clients used when the split_type is ‘iid’.
user (bool) – A flag to indicate whether partition users of the dataset into train-test groups. Only applicable to LEAF datasets. True means partitioning users of the dataset into train-test groups. False means partitioning each users’ samples into train-test groups.
train_test_split (float) – The fraction of data for training; the rest are for testing. e.g., 0.9 means 90% of data are used for training and 10% are used for testing.
num_class – The number of classes in this dataset.
seed – Random seed.
- class easyfl.datasets.Cifar10(root, fraction, split_type, user, iid_user_fraction=0.1, train_test_split=0.9, minsample=10, num_class=80, num_of_client=100, class_per_client=2, setting_folder=None, seed=- 1, weights=None, alpha=0.5)[source]¶
- class easyfl.datasets.Cifar100(root, fraction, split_type, user, iid_user_fraction=0.1, train_test_split=0.9, minsample=10, num_class=80, num_of_client=100, class_per_client=2, setting_folder=None, seed=- 1, weights=None, alpha=0.5)[source]¶
- class easyfl.datasets.FederatedDataset[source]¶
The abstract class of federated dataset for EasyFL.
- abstract loader(batch_size, shuffle=True)[source]¶
Get data loader.
- Parameters
batch_size (int) – The batch size of the data loader.
shuffle (bool) – Whether shuffle the data in the loader.
- property users¶
Get client ids of the federated dataset.
- class easyfl.datasets.FederatedImageDataset(root, simulated, do_simulate=True, extensions=('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif', '.tiff', '.webp'), is_valid_file=None, transform=None, target_transform=None, client_ids='default', num_of_clients=10, simulation_method='iid', weights=None, alpha=0.5, min_size=10, class_per_client=1)[source]¶
Federated image dataset, data of clients are in format of image folder.
- Parameters
root (str|list[str]) – The root directory or directories of image data folder. If the dataset is simulated to multiple clients, the root is a list of directories. Otherwise, it is the directory of an image data folder.
simulated (bool) – Whether the dataset is simulated to federated learning settings.
do_simulate (bool, optional) – Whether conduct simulation. It is only effective if it is not simulated.
extensions (list[str], optional) – A list of allowed image extensions. Only one of extensions and is_valid_file can be specified.
is_valid_file (function, optional) – A function that takes path of an Image file and check if it is valid. Only one of extensions and is_valid_file can be specified.
transform (torchvision.transforms.transforms.Compose, optional) – Transformation for data.
target_transform (torchvision.transforms.transforms.Compose, optional) – Transformation for data labels.
num_of_clients (int, optional) – number of clients for simulation. Only need if doing simulation.
simulation_method (optional) – split method. Only need if doing simulation.
weights (list[float], optional) – The targeted distribution of quantities to simulate quantity heterogeneity. The values should sum up to 1. e.g., [0.1, 0.2, 0.7]. The num_of_clients should be divisible by len(weights). None means clients are simulated with the same data quantity.
alpha (float, optional) – The parameter for Dirichlet distribution simulation, only for dir simulation.
min_size (int, optional) – The minimal number of samples in each client, only for dir simulation.
class_per_client (int, optional) – The number of classes in each client, only for non-iid by class simulation.
client_ids (list[str], optional) – A list of client ids. Each client id matches with an element in roots. The client ids are [“f0000001”, “f00000002”, …] if not specified.
- loader(batch_size, client_id=None, shuffle=True, seed=0, num_workers=2, transform=None)[source]¶
Get dataset loader.
- Parameters
batch_size (int) – The batch size.
client_id (str, optional) – The id of client.
shuffle (bool, optional) – Whether to shuffle before batching.
seed (int, optional) – The shuffle seed.
transform (torchvision.transforms.transforms.Compose, optional) – Data transformation.
num_workers (int, optional) – The number of workers for dataset loader.
- Returns
The data loader to load data.
- Return type
torch.utils.data.DataLoader
- property users¶
Get client ids of the federated dataset.
- class easyfl.datasets.FederatedTensorDataset(data, transform=None, target_transform=None, process_x=<function default_process_x>, process_y=<function default_process_x>, simulated=False, do_simulate=True, num_of_clients=10, simulation_method='iid', weights=None, alpha=0.5, min_size=10, class_per_client=1)[source]¶
Federated tensor dataset, data of clients are in format of tensor or list.
- Parameters
data (dict) – A dictionary of data, e.g., {“id1”: {“x”: [[], [], …], “y”: […]]}}. If simulation is not done previously, it is in format of {‘x’:[[],[], …], ‘y’: […]}.
transform (torchvision.transforms.transforms.Compose, optional) – Transformation for data.
target_transform (torchvision.transforms.transforms.Compose, optional) – Transformation for data labels.
process_x (function, optional) – A function to preprocess training data.
process_y (function, optional) – A function to preprocess testing data.
simulated (bool, optional) – Whether the dataset is simulated to federated learning settings.
do_simulate (bool, optional) – Whether conduct simulation. It is only effective if it is not simulated.
num_of_clients (int, optional) – number of clients for simulation. Only need if doing simulation.
simulation_method (optional) – split method. Only need if doing simulation.
weights (list[float], optional) – The targeted distribution of quantities to simulate quantity heterogeneity. The values should sum up to 1. e.g., [0.1, 0.2, 0.7]. The num_of_clients should be divisible by len(weights). None means clients are simulated with the same data quantity.
alpha (float, optional) – The parameter for Dirichlet distribution simulation, only for dir simulation.
min_size (int, optional) – The minimal number of samples in each client, only for dir simulation.
class_per_client (int, optional) – The number of classes in each client, only for non-iid by class simulation.
- loader(batch_size, client_id=None, shuffle=True, seed=0, transform=None, drop_last=False)[source]¶
Get dataset loader.
- Parameters
batch_size (int) – The batch size.
client_id (str, optional) – The id of client.
shuffle (bool, optional) – Whether to shuffle before batching.
seed (int, optional) – The shuffle seed.
transform (torchvision.transforms.transforms.Compose, optional) – Data transformation.
drop_last (bool, optional) – Whether to drop the last batch if its size is smaller than batch size.
- Returns
The data loader to load data.
- Return type
torch.utils.data.DataLoader
- property users¶
Get client ids of the federated dataset.
- class easyfl.datasets.FederatedTorchDataset(data, users)[source]¶
Wrapper over PyTorch dataset.
- Parameters
data (dict) – A dictionary of client datasets, format {“client_id”: dataset1, “client_id2”: dataset2}.
- loader(batch_size, client_id=None, shuffle=True, seed=0, num_workers=2, transform=None)[source]¶
Get data loader.
- Parameters
batch_size (int) – The batch size of the data loader.
shuffle (bool) – Whether shuffle the data in the loader.
- property users¶
Get client ids of the federated dataset.
- class easyfl.datasets.Femnist(root, fraction, split_type, user, iid_user_fraction=0.1, train_test_split=0.9, minsample=10, num_class=62, num_of_client=100, class_per_client=2, setting_folder=None, seed=- 1, **kwargs)[source]¶
- FEMNIST dataset implementation. It gets FEMNIST dataset according to configurations.
It stores the processed datasets locally.
- base_folder¶
The base folder path of the datasets folder.
- Type
str
- class_url¶
The url to get the by_class split FEMNIST.
- Type
str
- write_url¶
The url to get the by_write split FEMNIST.
- Type
str
- class easyfl.datasets.Shakespeare(root, fraction, split_type, user, iid_user_fraction=0.1, train_test_split=0.9, minsample=10, num_class=80, num_of_client=100, class_per_client=2, setting_folder=None, seed=- 1, **kwargs)[source]¶
Shakespeare dataset implementation. It gets Shakespeare dataset according to configurations.
- base_folder¶
The base folder path of the datasets folder.
- Type
str
- raw_data_url¶
The url to get the by_class split shakespeare.
- Type
str
- write_url¶
The url to get the by_write split shakespeare.
- Type
str
- easyfl.datasets.construct_datasets(root, dataset_name, num_of_clients, split_type, min_size, class_per_client, data_amount, iid_fraction, user, train_test_split, quantity_weights, alpha)[source]¶
Construct and load provided federated learning datasets.
- Parameters
root (str) – The root directory where datasets stored.
dataset_name (str) – The name of the dataset. It currently supports: femnist, shakespeare, cifar10, and cifar100. Among them, femnist and shakespeare are adopted from LEAF benchmark.
num_of_clients (int) – The targeted number of clients to construct.
split_type (str) – The type of statistical simulation, options: iid, dir, and class. iid means independent and identically distributed data. niid means non-independent and identically distributed data for Femnist and Shakespeare. dir means using Dirichlet process to simulate non-iid data, for CIFAR-10 and CIFAR-100 datasets. class means partitioning the dataset by label classes, for datasets like CIFAR-10, CIFAR-100.
min_size (int) – The minimal number of samples in each client. It is applicable for LEAF datasets and dir simulation of CIFAR-10 and CIFAR-100.
class_per_client (int) – The number of classes in each client. Only applicable when the split_type is ‘class’.
data_amount (float) – The fraction of data sampled for LEAF datasets. e.g., 10% means that only 10% of total dataset size are used.
iid_fraction (float) – The fraction of the number of clients used when the split_type is ‘iid’.
user (bool) – A flag to indicate whether partition users of the dataset into train-test groups. Only applicable to LEAF datasets. True means partitioning users of the dataset into train-test groups. False means partitioning each users’ samples into train-test groups.
train_test_split (float) – The fraction of data for training; the rest are for testing. e.g., 0.9 means 90% of data are used for training and 10% are used for testing.
quantity_weights (list[float]) – The targeted distribution of quantities to simulate data quantity heterogeneity. The values should sum up to 1. e.g., [0.1, 0.2, 0.7]. The num_of_clients should be divisible by len(weights). None means clients are simulated with the same data quantity.
alpha (float) – The parameter for Dirichlet distribution simulation, applicable only when split_type is dir.
- Returns
Training dataset.
FederatedDataset
: Testing dataset.- Return type
- easyfl.datasets.data_simulation(data_x, data_y, num_of_clients, data_distribution, weights=None, alpha=0.5, min_size=10, class_per_client=1, stack_x=True)[source]¶
Simulate federated learning datasets by partitioning a data into multiple clients using different strategies.
- Parameters
data_x (list[Object]) – A list of data.
data_y (list[Object]) – A list of dataset labels.
num_of_clients (int) – The number of clients to partition to.
data_distribution (str) – The ways to partition the dataset, options: iid: Partition dataset into multiple clients with equal quantity (difference is less than 1) randomly. dir: partition dataset into multiple clients following the Dirichlet process. class: partition dataset into multiple clients based on classes.
weights (list[float], optional) – list, for simulating data quantity heterogeneity If None, each client are simulated with same data quantity Note: num_of_clients should be divisible by len(weights)
weights – The targeted distribution of data quantities. The values should sum up to 1. e.g., [0.1, 0.2, 0.7]. When weights=None, the data quantity of clients only depends on data_distribution.
alpha (float, optional) – The parameter for Dirichlet process simulation. It is only applicable when data_distribution is dir.
min_size (int, optional) – The minimum number of data size of a client. It is only applicable when data_distribution is dir.
class_per_client (int) – The number of classes in each client. It is only applicable when data_distribution is class.
stack_x (bool, optional) – A flag to indicate whether using np.vstack or append to construct dataset. It is only applicable when data_distribution is class.
- Raises
ValueError – When the simulation method data_distribution is not supported.
- Returns
A list of client ids. dict: The partitioned data, key is client id, value is the client data. e.g., {‘client_1’: {‘x’: [data_x], ‘y’: [data_y]}}.
- Return type
list[str]
- easyfl.datasets.equal_division(num_groups, data_x, data_y=None)[source]¶
Partition data into multiple clients with equal quantity.
- Parameters
num_groups (int) – THe number of groups to partition to.
data_x (list[Object]) – A list of elements to be divided.
data_y (list[Object], optional) – A list of data labels to be divided together with the data.
- Returns
A list where each element is a list of data of a group/client. list[list]: A list where each element is a list of data label of a group/client.
- Return type
list[list]
Example
>>> equal_division(3, list[range(9)]) >>> ([[0,4,2],[3,1,7],[6,5,8]], [])
- easyfl.datasets.iid(data_x, data_y, num_of_clients, x_dtype, y_dtype)[source]¶
Partition dataset into multiple clients with equal data quantity (difference is less than 1) randomly.
- Parameters
data_x (list[Object]) – A list of data.
data_y (list[Object]) – A list of dataset labels.
num_of_clients (int) – The number of clients to partition to.
x_dtype (numpy.dtype) – The type of data.
y_dtype (numpy.dtype) – The type of data label.
- Returns
A list of client ids. dict: The partitioned data, key is client id, value is the client data. e.g., {‘client_1’: {‘x’: [data_x], ‘y’: [data_y]}}.
- Return type
list[str]
- easyfl.datasets.non_iid_class(data_x, data_y, class_per_client, num_of_clients, x_dtype, y_dtype, stack_x=True)[source]¶
Partition dataset into multiple clients based on label classes. Each client contains [1, n] classes, where n is the number of classes of a dataset.
- Note: Each class is divided into ceil(class_per_client * num_of_clients / num_class) parts
and each client chooses class_per_client parts from each class to construct its dataset.
- Parameters
data_x (list[Object]) – A list of data.
data_y (list[Object]) – A list of dataset labels.
class_per_client (int) – The number of classes in each client.
num_of_clients (int) – The number of clients to partition to.
x_dtype (numpy.dtype) – The type of data.
y_dtype (numpy.dtype) – The type of data label.
stack_x (bool, optional) – A flag to indicate whether using np.vstack or append to construct dataset.
- Returns
A list of client ids. dict: The partitioned data, key is client id, value is the client data. e.g., {‘client_1’: {‘x’: [data_x], ‘y’: [data_y]}}.
- Return type
list[str]
- easyfl.datasets.non_iid_dirichlet(data_x, data_y, num_of_clients, alpha, min_size, x_dtype, y_dtype)[source]¶
Partition dataset into multiple clients following the Dirichlet process.
- Parameters
data_x (list[Object]) – A list of data.
data_y (list[Object]) – A list of dataset labels.
num_of_clients (int) – The number of clients to partition to.
alpha (float) – The parameter for Dirichlet process simulation.
min_size (int) – The minimum number of data size of a client.
x_dtype (numpy.dtype) – The type of data.
y_dtype (numpy.dtype) – The type of data label.
- Returns
A list of client ids. dict: The partitioned data, key is client id, value is the client data. e.g., {‘client_1’: {‘x’: [data_x], ‘y’: [data_y]}}.
- Return type
list[str]
- easyfl.datasets.quantity_hetero(weights, data_x, data_y=None)[source]¶
Partition data into multiple clients with different quantities. The number of groups is the same as the number of elements of weights. The quantity of each group depends on the values of weights.
- Parameters
weights (list[float]) – The targeted distribution of data quantities. The values should sum up to 1. e.g., [0.1, 0.2, 0.7].
data_x (list[Object]) – A list of elements to be divided.
data_y (list[Object], optional) – A list of data labels to be divided together with the data.
- Returns
A list where each element is a list of data of a group/client. list[list]: A list where each element is a list of data label of a group/client.
- Return type
list[list]
Example
>>> quantity_hetero([0.1, 0.2, 0.7], list(range(0, 10))) >>> ([[4], [8, 9], [6, 0, 1, 7, 3, 2, 5]], [])
easyfl.models¶
easyfl.communication¶
- easyfl.communication.init_stub(typ, address)[source]¶
Initialize gRPC stub.
- Parameters
typ (str) – Type of service, option: client, server, tracking
address (str) – Address of the gRPC service.
- Returns
stub of the gRPC service.
- Return type
(
ClientServiceStub`|:obj:`ServerServiceStub`|:obj:`TrackingServiceStub
)
- easyfl.communication.start_service(typ, service, port)[source]¶
Start gRPC service. :param typ: Type of service, option: client, server, tracking. :type typ: str :param service: gRPC service to start. :type service:
ClientService`|:obj:`ServerService`|:obj:`TrackingService
:param port: The port of the service. :type port: int