When the Curious Abandon Honesty: Federated Learning Is Not Private

Share This Post

Previously on Private AI’s speaker series CEO, Patricia Thaine, sat down with Franziska Boenisch to discuss her latest paper, ‘When the Curious Abandon Honesty: Federated Learning Is Not Private’

Franziska completed a Master’s degree in Computer Science at Freie University Berlin and Technical University Eindhoven. For the past 2.5 years, she has been working at Fraunhofer AISEC as a Research Associate in topics related to Privacy Preserving Machine Learning, Data Protection, and Intellectual Property Protection for Neural Networks. Currently, she is a visiting research intern at the Vector Institute in Toronto in Nicolas Papernot’s group on Trustworthy ML. Additionally, she is pursuing her PhD in Berlin.

During this session, Private AI deep-dives into Franziska’s discoveries, the role of differential privacy in federated learning, and what it would take for federated learning to be private against malicious adversaries. 

If you missed the last session, scroll down to recap Patricia and Franziska’s discussion or watch the full session below.

Watch the full session:

PAI: Can you please explain what federated learning is and how it works? 

Franziska: In short, federated learning is just a framework for distributed machine learning. To go a bit more in detail, I’ll explain the vanilla version of it. Of course, there are several flavors, but I think for sake of here, it’s enough to understand the general concepts. So basically, in federated learning, we have two central roles. One is a central party, basically a server that orchestrates this distributed machine learning protocol. The server creates and initializes a shared machine learning model and then organizes the training among the clients. The clients are often using mobile devices that hold their own and potentially sensitive data and they participate in the training. Not in every round of the protocol all of the clients participate, but instead in every round when the machine learning model is trained, a subset of clients–like several hundreds of thousands–are selected by the server and then the real process starts. 

The first step in the process is that the server sends out the shared model to all the clients. The clients then compute the local gradients or, more generally speaking, just updates for this model on their local data for the received model. And then once they’ve calculated these, the clients send their gradients back to the server. The server aggregates the clients’ updates and then applies them to the shared model, and afterwards the whole process starts again. So there’s a new round, a new client selection, sending out the model again, calculating the gradient of the users and aggregating them, and finally the server updating the model. And therefore, we can basically think of federated learning as a form of decentralized mini-batch Stochastic Gradient Descent (SGD), where basically every client calculates the gradient of their batch of data and then the server does the aggregation of the different batches. 

And of course, as I said, there are different flavors of federated learning. For example, there can be variation in how the clients are selected and by whom they are selected. Therefore, federated learning is more like an umbrella term for all similar methods. But in general, it works like the vanilla version I explained. The question of course arises: why is federated learning used in the first place and what are the benefits? In general, we know that machine learning models train much better if we have larger amounts of data or/and if we have more diverse data. This is what we can achieve through federated learning because we have data from many clients, so we have larger amounts of data and since these clients are potentially all different, we also have more heterogeneous data and might thereby end up with a better model. 

Another benefit is also that we have some continuity in the learning. So the models are constantly improved based on up-to-date data of the clients and there is no need to aggregate over time again and to check whether the model is “up to date”. And the question is of course, why do we not just collect the data from the clients like we used to do and take them to the server and perform a centralized machine learning model? Well, there are several reasons why we wouldn’t do that.The first one is that sending large amounts of data creates a communication overhead. Furthermore, the server would have to store all this data and then the server would also be in charge of processing the data according to prevalent privacy laws. Most importantly, sharing such sensitive data like we have on the user devices in plain format with the server might represent a privacy threat for the users. And therefore, the idea behind federated learning is that we can get rid of all of this by just keeping the data locally and just sending the client updates. For a long time, federated learning was also sold as a kind of privacy preserving technology because the user data never leaves the devices and it was thought to have some form of confidentiality. 

However, some prior research work had already shown that this protection is just a thin facade because the model gradients that are sent from the users to the server still leaks some information on the user’s training data. However, much of the previous work that looked into reconstructing the data from the gradients was kind of limited because they require a lot of computation. 

So for example, they relied on high order optimization or generative adversarial networks (GANs) to reconstruct some data that would produce the gradients like the ones that were observed. And basically through this computational complexity, these methods were usually not very adequate to reconstruct high dimensional data or for data from the same class over large mini batches. All of that makes this optimization more complicated. And therefore, most of these difficult cases that I just named, failed to produce high fidelity reconstructions.

PAI: Fascinating. And you’ve managed to perfectly reconstruct user inputs from model updates, is that correct? 

Franziska: Yes, exactly. So our method actually does not suffer from this drawback, because in contrast to the prior work, instead of doing a reconstruction, we extract the data directly from the gradients. And this extraction, first of all, is much more computationally efficient. It has the advantage that it is high fidelity, in the sense that it extracts the data perfectly without any error, even if the data comes from the same class or the gradients are calculated over large mini batches of data, and vendor data is very high dimensional. So this is all the benefits that we get from our actual data extraction. 

PAI: How does your attack work and how does it differ from others proposed by the community?

Franziska: In contrast to previous methods, we do not perform data reconstruction but actual data extraction. We observed for the first time that even for large mini batches of data, when we calculate a gradient over a fully connected model layer, we can directly extract individual data points from the gradients. This is because the individual data points are just inside the gradients; they are scaled by a factor. This factor just depends on the gradient of the bias. The gradient of the bias is something that is shared from the users to the server, along with the gradient of the weights, such that the server basically in the end just needs to take the gradient, rescale them by the inverse of this factor according to the bias, and then it ends up with these data points. 

So we don’t need to do any optimization, we don’t need to do any attempts for reconstruction, and we basically just take what is given for free. What’s most interesting is the question: why is this data directly extractable? Prior work had already shown that when we have a fully connected model layer and we just input one data point through this layer, this data point is directly extractable from the gradients. 

To give a brief intuition why this is the case, this is basically just when we use the radioactivation function. If you remember the ReLU activation function, it basically gives us a function which for negative input is zero and for positive input it is like a linear function. Hence, we have this linearity of the input multiplied with the weights plus the bias, and then basically this linearity is invertible. However, why does that also work for larger batch sizes? We have the gradients of all individual data points of this batch and these gradients are just overlaid. If something is overlaid, we expect that if we reconstruct it that we will just end up with some average or overlay of all data points where we cannot see individual data points clearly.

And here is where one of our key contributions in the work lay is that the observation we made is that even in such cases where we have many more than just one data point, we are able to extract individual data points. Why does that happen? Again, the ReLU activation function helps us, because you can imagine this ReLU function for negative inputs is all zero. And when the input concerning one data point to the reactivation function is zero is negative, then the output is zero. With zero output, no information on that data point is propagated through the machine learning model and when no information is propagated, the gradients at this place will be zero because there’s nothing that we have as information to update on. If this happens for all data points in the data mini batch, but one data point, we have all zero gradients for all, but one data point. Even if we overlay them or average them or whatever, then this overlay will still be the one perfect individual data point. And thereby basically the case of having large mini batches reduces just to the case where we have this one data point mini batch, where it has been mathematically shown that extraction is perfectly possible. And maybe as a side note, this does not only hold for the ReLU activation function, but also for other activation functions where basically gradients of zero can be produced: If you, for example, think about the tanh on the Sigmoid, they have these large regions where the function is just super flat, and like in the super flat regions, we can also achieve zero gradients. And then we have the same case again. 

PAI: You describe two threat models in your paper. So there’s the honest but curious model, and then there’s a malicious model. Can you explain what these two are and how they affect the way we understand the vulnerability of a central party?

Franziska: Sure, exactly. We consider two threat models for the central party in federated learning, and we can think of the honest but curious central party as a form of passive attacker who does not really deviate from the original federated learning protocol, but it just works kind of with what is given naturally to them. So this honest but curious central party tries to disclose the privacy of the users by just, for example, looking at the gradients that it receives. And for our attack, we show that even such a passive attacker can really benefit from the concept of this leakage that I’ve just described of the individual data points because it just receives the gradient and it can basically take out some data points from there for free. 

Just to give you a little idea on how much data are we talking about, even if we average or let the users calculate their gradients on rather large mini batches with 100 samples on very complex data sets such as ImageNet, such a passive attacker who absolutely and honestly follows the protocol, but is just curious, they can still extract over 20 individual data points for 20% of the data practically given for free. And then we turn to a second scenario, which is a scenario of a malicious server.

The malicious server is basically someone who can deviate from the original federated learning protocol and do some actions that they are not supposed to do. More concretely, we looked at the case of an occasionally Byzantine server, so that means that the server just often behaves correctly and followed the protocol. But in some cases it manipulates something in order to extract even more data. We thought that this scenario of this occasionally Byzantine server is quite realistic in federated learning because the central parties are often some large companies, and of course they have some reputation to lose, so they wouldn’t really want to be malicious all the time and be discovered. That would be terrible. Instead they might just step in some moments where they’re “less observed” somehow. We can even think of that it’s not a company as a whole, but just some malicious employees, like a group of two or three employees that’s usually in large companies, all it takes to get a code commit through, they collaborate and then alter this model maliciously and then basically can for a very short time just deviate from the protocol how it’s supposed to be. And in our attack, what we exploit this power for is basically to adversarially initialize the weight of the shared model, and we are the first one to consider an active attacker to do exactly this. And previous attackers who have looked into malicious services have usually looked into using their power to, for example, control some clients to get the behavior that they want to. 

But we are controlling the model weights, and by controlling the weights, we make individual userdata extraction more efficient. Usually the weights in a neural network are initialized with a random function, for example Gaussian or whatever initialization function. And what we do now is to just also do this random initialization, but we make the negative components of the weight vectors slightly higher than the positive ones, and thereby the impact of this negative weight basically dominates. And we create this case where we have the multiplication of the weights and the input data (which we assume to be in range (0,1), a standard pre-processing) to be negative just much more often than it would regularly happen. And when that happens really often, we increase our chances that it is basically the case that in a mini batch, only one data point multiplied with the weights, gets a positive input to the ReLU and gets promoted, and thereby can be extracted perfectly.

PAI: In this paper you talk about making trap weights to understand differential impact on clients. Can this be considered to be similar to data poisoning? 

Franziska: Yes, indeed. It can be seen as some form of poisoning. Definitely. And usually what you want to do if you poison data points is that you want to bring an undesired behaviour into the machine learning model and we just take the direct path. Instead of changing some data, which then changes the model, we directly change the model. So it can be considered a form of poisoning. 

PAI: Do you think secure MPC frameworks can help mitigate a few of these issues?

Franziska: There are some ways in which people have tried to mitigate threats in federated learning. One way, for example, is secure aggregation, which is basically trying to, instead of each client sending their gradients directly to the server where the server can extract them,  to have the clients work together to aggregate their gradients together and then only share this aggregate with the server. 

In theory that sounds like a good idea for protection because then you have more data which is stuck together and with more data, individual extraction becomes less successful. Second of all, even if the server is able to extract some of the data, it doesn’t know which client this data belongs to anymore. However, that only works in the case of honest but curious servers, because what you can imagine now is that the server basically has this ability which has been largely studied in literature before, that it can control some fractions of the devices that participate in the federated learning training. And as I said, per round, we don’t have all millions of clients participate–we just have a few hundred, a few thousandths very likely that a large company has the power to maliciously control these clients such that they, for example, all only contribute zero gradients to the secure aggregation, and then everything is securely aggregated with one target client and all other control clients, and then even these aggregated gradients will still only contain the information on the target line and its privacy can still be broken. Recent work has even shown a more subtle way of doing so, which is basically sending out different models by the server to different clients. So all non-targeted clients would receive a model where they produce just zero gradients and a target line would just receive, for example, a normal model or we could combine it with our attack and would get our trip weighted model. And then again, like an aggregation, all these zero gradients would be aggregated with the target client gradients and the gradients would still only in an aggregate even contain only information of this particular client. 

PAI:  In the paper, you say that “For complex image datasets such as ImageNet, the attack yields perfect reconstruction of more than 50% of the training data points, even for large training data mini-batches that contain as many as 100 data points. For textual tasks such as IMDB sentiment analysis, it perfectly extracts more than 65% of the data points for mini-batches with 100 data points.” Can you tell us more about the different types of experiments you ran?

Franziska: Sure. So as you said we looked into some image data sets like the classic MNC for ten and ImageNet and then the text data set on the internet movie database for sentiment classification. And we basically had two branches of experiments. One was for the passive extraction and the other for the active extraction. And in the passive extraction, what we basically did was we took a fully connected neural network and we looked how different parameters influence the extractability of individual data points that I’ve been talking about. And we looked basically into two main parameters that have an impact on the success. The first one is the number of neurons that we have in this first fully connected layer where we do our extraction. And the second one is the batch size, like the mini batch size of the clients. 

We can think of if we have more neurons, then basically we have more gradients and each pair of gradients basically gives us a chance for individual extraction. So it’s just like having more potential chances. And of course, with the batch size, if we have smaller data, then we observed that there will be probably less data that can overlay. And indeed, what we found in our experiments was that the more neurons we have, and of course the smaller the batch sizes that the clients are, the better extraction gets. So with small enough batch sizes, and a large enough number of neurons, we are even able to extract all data points that the clients have trained on. And we’ve even looked at the fact that it is necessary that we really re-initialize this model every epoch. Again, for performing this type of attack. And it turns out that no, because even over training, even when the model improves in accuracy, still relatively constant amounts of data points are individually extractable. 

So this is great news for the server that wants to learn about private data. Less great news for the users who want to protect their privacy. For active attacks we basically looked at the same experiments, but we also played with different initialization for our trap rates. So we looked into how to make the trap rate so that they can even extract more data points. And we looked into how to make them unnoticeable such that clients would not be suspicious, but rather the weights would just look like normal model weights and it would be impossible for clients to tell apart whether the model they have received is a maliciously initialized one or just the result of previous training processes. 

PAI: Fascinating experiments. What are some of the most surprising discoveries that you made doing this research? 

Franziska: Well, I guess the most surprising for us was that all you need to break privacy and federated learning in the setup basically is matplotlib, because we just showed that the gradients of even the large batches contain individual training data points. So if we just take the gradients, rescale them and plug them into a matplotlib plotting function, we will see individual data points. When I first ran this experiment, I was completely stunned to see some of my individual data points back because I had really just expected an average. But of course it was also really surprising for us to see how powerful weight initialization of a model can be. Previously, such weight initializations have been heavily studied to look into model convergence, like how we can train the model faster, more efficiently, etc. But we have now also shown that this weight initialisation might have an impact on their privacy. And I think this is a very surprising and also very promising future direction. 

PAI: What would you say is the role of differential privacy in federated learning now that you’ve done this research? 

Franziska: So basically, federated learning in its standard form does not give any privacy guarantees, right? It’s not per se a privacy technology it just is supposed to offer some form of confidentiality because the raw data never leaves the device. Basically, through our attack, we saw that this is not even the case…. But differential privacy is really a technique which is supposed to give us some privacy guarantees that basically bound the privacy leakage of specific, sensitive user information. And there are different ways to implement differential privacy into the federated learning protocol. 

The first one is like the standard differential private federated averaging, which is like a differential private way of performing distributed SGD. And the clients basically take the gradients, clip them locally and send their clipped gradients to the server. The server applies the noise and then does the aggregation and the model update. And of course you can imagine that this is not a suitable application when the server is malicious because the server might just not add the noise or the server might extract the data from the gradients and then add the noise. So there’s basically no protection in this scenario. 

And then there is local differential privacy where users locally add enough noise to protect their own data. This is still always an open question: what is enough noise? But let’s say they managed to add “enough” noise and then share the gradients with the server. This is potentially a case where the privacy can be completely protected. However, it has been shown that local differential privacy has a terrible privacy utility trade off because the amounts of noise that I added are just too large. And as a form of mix between the two, distributed differential privacy has been introduced where the clients locally add some amount of noise and then in the form of secure aggregation, for example, by putting together all these slightly noisy gradients, the amount of noise will be enough to protect the joint gradients and yield high enough privacy guarantees. But of course we have already seen that there is a problem with the secure aggregation: even here we could think of the server manipulating some of the clients to always send these zero gradients and the other clients if they don’t have a way to verify that, it will be exactly the case again that the gradients of one client are aggregated with all zero gradients. And if in that case the local amount of noise of this client was not sufficient to protect the client alone, we will still end up with not high enough privacy guarantees. 

PAI: So what would it take to make federated learning robust against the malicious adversary? 

Franziska: We thought about this question for quite some time and for us it seems somehow difficult, even if the server acts maliciously only in a very few cases. There are some things of course that we can prevent, for example the fact that the service ends are different models. This can be prevented, for example, if the model is locked in some kind of blockchain which is centrally accessible by anyone and the clients can verify whether they obtained the right model or not. It is also possible, probably to prevent our active adversarial reinitialization. So we could think of a scenario in which the server initializes the model in a trusted execution environment and all the updates are performed there. 

So at least in this case, it will not be possible for the server to put some wrong initialization or to reinitialize in the meantime. Still, there’s this problem that this does not prevent the passive leakage and we’ve shown that even in this passive case 20% of the data can be extracted, which is of course too much to say that it’s privacy preserving. And the question is also on the control of other clients. So if the server even controls a small fraction of clients, it still can do the attacks like it’s described. And it’s very difficult for clients to verify that other clients are not corrupt because they usually don’t have direct communication channels. And even if they did, what would it take for someone to prove to someone else that they are not malicious or that they are not controlled? 

So this is an open question and a problem in making it private. And with differential privacy we’ve also seen that it only works if the clients add enough local noise, which then again comes at the cost of utility or accuracy of the model, or if we find a way to verify that over all clients enough noise is added. 

So, what we took out as a baseline message for us was at least the clients to just not participate in federated learning protocols when the server is not trusted. And if you are a trusted party and you want to employ federated learning in a private way, basically what is the minimum requirement would be to have the clients add enough noise locally, perform some form of secure aggregation, have the clients compute their gradients over large batches, version the model and upload it to a point where it can basically be verified by the clients, and perform your updates in a trusted execution environment. 

Click to view Franziska’s website for more information on her publications and research.

Subscribe To Our Newsletter

Sign up for Private AI’s mailing list to stay up to date with more fresh content, upcoming events, company news, and more! 

More To Explore


Testé sur un ensemble de données composé de données conversationnelles désordonnées contenant des informations de santé sensibles. Téléchargez notre livre blanc pour plus de détails, ainsi que nos performances en termes d’exactitude et de score F1, ou contactez-nous pour obtenir une copie du code d’évaluation.

99.5%+ Accuracy

Number quoted is the number of PII words missed as a fraction of total number of words. Computed on a 268 thousand word internal test dataset, comprising data from over 50 different sources, including web scrapes, emails and ASR transcripts.

Please contact us for a copy of the code used to compute these metrics, try it yourself here, or download our whitepaper.