The Trust Fallacy in Zero Knowledge Web Application

When I first heard about Marco Barulli’s idea for a “Zero-Knowledge Web Application” I was happy. I felt I had found someone who thought the same way I do, with which we could collaborate, and I told him so. But when I read the first definition, I was less impressed. I tried to open a discussion with Marco about it, but to no avail.

Since then, he has continued on his way, and I have continued on my own. Recently, thanks to the big name Richard Stallman, there has been much buzz about Freedom in the cloud and Zero Knowledge Web Applications.

So it’s time for me to attempt to open a discussion again. Last time I did it privately and got no response. This time, I’ll talk here on the Passpack blog (for the first time ever – Tara is helping me with my English).

Host-Proof Hosting and Data Privacy

From Ajax Patterns:

  • Problem: How can you mitigate the effects of unauthorized access to your Application data?
  • Solution: Host sensitive data is in encrypted form, so that clients can only access and manipulate it by providing a pass-phrase which is never transmitted to the server.

This is the core of a Host-Proof Hosting approach to secure data management.

Zero Knowledge Web Application

A Zero-Knowledge Web Application is a Host-Proof Hosting application which adds additional restrictions:

“The basic idea was to deliver a no trust needed service, where users had the ability to inspect and verify anything running in their browser. We had to drift the attention away from trusting us and let users focus on trusting the application.”

Here is the definition of how this is intended to work.

It is my belief that the definition adds un-needed elements which do not stand the test of an in-depth analysis.

The Trust Fallacy

The concept that an application can be trusted without having to trust the application provider is pure Utopia. The Zero-Knowledge Web Application is a risky type of idealism which attempts to convince people that they need not trust an application provider, because the technology is inherently trustworthy.

It is no secret that anyone who provides you with a web service, has the ability to change the source code behind that service, perhaps even put a back-door if and when they wanted to. But why would they want to? Luckily, few people would risk completely ruining their good name after years of building a reputation for themselves.

Before using any web application, you must first decide if you trust the people or company offering the service.

You must first decide what sort of reputation they have built for themselves. You cannot trust what you do not know. And if you know it, and you don’t like it, you should not use the service.

Your freedom is at risk if you do not “think before doing” on the Internet.

The Zero Knowledge Web Application as-is, is a theory. This is not to say that there couldn’t be a future where it might become a credible solution for privacy, but until that happens, it is inappropriate to ask people to trust a theory with just too many inconsistencies.

Analysis of the Zero Knowledge Web Application

I am sorry for the long post, but I’d like to explain point by point what I mean by “inconsistencies”. The heading numbers refer to the headings in this post.

1. Host-Proof Hosting — Obviously, I agree.

2. Hide Nothing — I disagree

A Host-Proof Hosting application receives encrypted data from the server and manages your plain data in the browser. Only encrypted data is sent to the server. To understand if the process is correctly implemented, you need only analyze the client-side Javascript code. If the client sends your encryption keys to the server – beware, it is not really Host-Proof Hosting. If the client only sends encrypted data, it’s OK, the server can do nothing.

There is no need to know what the server is doing. The client is the focal point.

For example, Passpack code (like many other Ajax applications) is optimized, but you can perform a code review by downloading it and running it through an online tool like Beautify Javascript which makes it perfectly legible.

2.2. Code Integrity

As of date, the only application which defines itself as a “Zero Knowledge Web Application” is Clipperz, so I apologize in advance if I seem to be isolating them in my critique of Zero Knowledge Web Application, but it is impossible to avoid.

Clipperz themselves recognize the “less than ideal solution” of declaring its own checksum. A user downloads all the code and runs a MD5 checker and verifies that the checksum is (obviously) correct. But who’s to say my IP hasn’t been read, the code changed on-the-fly for my benefit, a different checksum given to me, etc?

Would we not (and do we not) trust it, if a reputable site could check the web application and host the checksum? Would a third party not guarantee its validity?

Let’s put it this way – suppose that you ask me for a valid ID, and I provide you with a passport that I printed myself. Would you accept it? Is the Us government stamp/seal really that important as a third party verification?

I know Marco and Giulio Cesare personally and I can vouch for their absolute transparency, as I hope they would mine, but that does not mean that people do not need to arrive to such a conclusion on their own. People have to understand that it is their right to make an informed decision and decide to trust. And what to trust. Whether you trust a provider or an application itself you must always be aware that the two are connected.

If you trust an application, by default you are trusting the providers behind it. It is a fact, whereas pure trust in an application alone is a theory.

3. Prevent code changes — I can’t see why.

As a possible solution to the code integrity issue, Clipperz wrote:

“Ideally we envision a solution that is completely browser based and relies on a redundant and distributed network of servers not associated with the application provider. Each third party server hosts the fingerprint of the Zero-Knowledge Web Application, i.e. the checksum of its source code.”

I will indeed be very happy when this happens. Let us suppose that a similar network exists, surely if my browser can check the code at the first loading, it can check every subsequent loading as well. If any of the libraries’ checksums are incorrect, the browser can stop loading.

By loading only the necessary code on-demand, we can optimize the user experience and overall speed of the application. This is a big benefit of Ajax, why forbid it?

4. Learn Nothing — I have some doubts.

Passpack knows a little bit about its users. Specifically, Passpack knows the User ID, necessary to authenticate the user, the IP of the user when he connects to and optionally, if the user wants to receive emergency support, a working email. This can be (and in big percentage is) an anonymous email service. Passpack doesn’t care who you are in the real world, but if someone requests an emergency account suspension, we need to make sure the request is coming from the rightful owner of the account. Users who choose not to provide an email can remain completely anonymous, knowing that we cannot give them the same level of support. They are free to choose.

Clipperz is a self-defined Zero-Knowledge Web Application so I must suppose that they know nothing about me. But… I connect to Clipperz and am greeted with a box that tells me I am connected from Rome, and a week ago I connected from Milan, etc. This is a nice security feature… but it is in direct contrast with the Zero-Knowledge Web Application definition. Can not even the Clipperz application fully adhere to its own definition?

Personally, I am convinced that my privacy (as that of everyone) is sacred and I go to great lengths to protect it. But (on a more practical level) I want a certain amount of support if ever I do encounter an emergency situation. I want to be free to choose.

For example, suppose I remember my Packing Key but I have forgotten my Pass (users do this frequently). If I chose to provide an email for emergency support, Passpack can verify that I am the rightful account owner and reset the Pass for me and my data is still confidential thanks to the Packing Key. With a simple email, I have avoided a situation where with Zero-Knowledge, would have lead to a permanent loss of availability of all my data.

The three pillars of information security are confidentiality, integrity, and availability.

A system that gives me more options for maintaining the availability of my data, without sacrificing the confidentiality and integrity of my data, is more secure than a system that can not.


Standards are born in two ways: 1) because a plethora of people using the same technique make it a de-facto standard; or 2) because it goes through a proposal process, is analyzed by the international community, becomes a release candidate, and when everyone is convinced of the validity, it is finally granted the status of standard.

The definition of the “Zero-Knowledge Web Application” has done neither. It’s being helped along by the enthusiasm of people who, in good faith, applaud the intentions without digging into how to make it work properly. I hope my observations can spark a discussion to put things back on the right path.


19 responses to “The Trust Fallacy in Zero Knowledge Web Application

  1. Pingback: Ajaxian » Passpack releases Host-Proof Hosting Library

  2. Pingback: Some Passpack Buzz n’ Love « Passpack Blog

  3. “If the client only sends encrypted data, it’s OK, the server can do nothing.”

    I am afraid that this is wrong. The server can try guessing the decryption key. And if this key is derived from a user-defined password (passphrase), then chances are that, for at least a large percentage of the users, the correct passphrase can be guessed, e.g. by means of an offline dictionary attack, and the (people behind the) server can decrypt the information despite the encryption. [never mind the recommendation that the passphrase should be >X characters long; most users will still use low-entropy passphrases I would expect]

    This is surely more than “nothing”…

    What does “host proof hosting” really mean? Well, it means that those who are able and willing to analyze the javascript, may do so and, in order to convince themselves that their passphrase is indeed used correctly to encrypt the data before it leaves the computer, must do so *every time they are about to send something to the server*.

    Note: it is not sufficient to check the code only once or twice; it must be checked sufficiently many times in order to ensure that the server cannot cheat by selectively sending “bad” versions of the code only sometimes (e.g. in order to selectively “attack” a targeted subset of users)…

    What does “host proof hosting” buy the average user? Given that there is no proven expert community that has done any analysis on the passpack javascript that the user can refer to, and given that it is very unlikely that any serious reviewing process will ever be willing to review passpack’s javascript *continuously and over an indefinite time period*, – “host proof hosting” does not appear to buy the average user anything (at least not in terms of security).

    Of course, the situation is similar with respect to “zero-knowledge web apps”. I think that the fact that “zero knowledge web apps” and “host proof hosting” does not remove the need to trust the providers of the software/service *at all*, means that these terms are really nothing much more than marketing buzzwords.

  4. @anonymous
    Brilliant! Thanks for chiming in. You hit some key points. Just one thing, Host-Proof Hosting isn’t just a buzzword. We’ didn’t build Passpack then say… hm what can we call this. We specifically chose the pattern because it’s what we would have chosen for our own data.

    I’m going to let Francesco answer more fully, but expect that to take a little longer… I’m really just writing now to say: reply forthcoming.


  5. …in the meantime let me add some minor clarifications. I did not mean to imply that you built passpack first and then started looking for suitable marketing terms. (although I do not imply the contrary, too :-) )

    What I want to say is that “Host proof hosting” (i.e. encrypting data items on the client side based on a passphrase before sending them to the server, using software provided by the same server) serves more political than security reasons.

    Why? Because
    this is a very good argument to convince some people that “hey – we do not really want to just get hold of your passwords and run off. See? We do everything we can think of to demonstrate that our intentions are good. We even let you encrypt everything before sending it to us!” It is indeed an argument that, by and large, works. It convinces me, too, that you do not intend to abuse the information people entrust you with.
    (although, personally, the argument does not convince me in isolation, but in combination with many other factors about this site).

    But that’s about it. Users need to trust passpack one way or another – “host proof hosting” does not remove that requirement.

    That’s why I say that “host proof hosting” more a ‘marketing buzzword’ (i.e. a term used to establish trust regarding the intentions of the people behind passpack or any similar service) rather than a security feature.

  6. @anonymous

    First of all, let me say that I am very happy for your comment. It puts the attention on the central question of my post: trust.

    The server can do something.
    It is true. But, Host-proof hosting is a methodology. How it is implemented makes the difference between a good service, and a bad service. If a service allows its users to choose a weak password, it is not a good service. Passpack forces the user to choose a Packing Key of almost 80-bits, so that they are obligated to type a key sufficiently hard to guess. We lost a lot of users for this, but we prefer it that way. Also, we use a non-standard hash to avoid dictionary attacks. Security is security of the entire system.

    The code review
    You are right. It is actually impossible to make a continuous code review. We know that the web site can change the code just for one user, and under very specific circumstances. But a similar selective attack is possible with every site on the Internet (Ajax or traditional, HPH or not). A large percentage of computer frauds are due to employees. Banks know this very well and try to hide the phenomenon. The difference between a standard site and an HPH site is that: in the first case an employee can quietly work on the database without being discovered, and when he’s ready, copy the database; with HPH, the bad employee would have to inject malicious code into the web site code and wait to sniff a user’s credentials. This is definitely more difficult, it raises the bar. With HPH if the company has good anti-hacking controls, it is more likely to discover the code anomaly quickly, lowering the potential damage. With a traditional site, normally the theft is discovered once the hacker is long gone.

    I think that this is the principal difference between the two approaches. In every case *you must trust* your service provider. I think that people trust Passpack not just because we adopt HPH, but because we have demonstrated in this year and half that we are serious. HPH adds a level of security, but trust is between you and us.

    PS. IMHO if Host-Proof Hosting were a buzzword, the web would be full of HPH sites, but actually there are just a few. ;)

  7. “First of all, let me say that I am very happy for your comment.”

    I am happy, too, for your prompt response and the openness you people talk about things :-)

    “The difference between a standard site and an HPH site is: in the first case an employee can quietly work on the database without being discovered, and when he’s ready, copy the database; with HPH, the bad employee would have to inject malicious code into the web site code […]”

    Well, in the HPH case the “bad” employee could also copy the database (with the same ease); then he has to decrypt at least some entries of it. While this is an additional hurdle imposed by HPH (and this is security benefit of HPH as you pointed out), the “bad” employee can do this task secretly on his own computer(s) with this copy of the database, by means of, for example, an offline dictionary attack.

    For passpack in particular:
    If “packing passphrases” must have a minimum length of 80 bits is, I believe, a good practically enforceable tradeoff between usability and effective key strength. However, 80 bits of length does *not* mean 80 bits of entropy and, hence, dictionary attacks (with word combinations) can still be successful.

    So – yes – there is a security benefit in HPH. In particular, it provides a certain degree of protection against a (of course hypothetical) “bad” passpack employee. But this benefit is only there under the assumption that we trust passpack in the first place – this goes back to the issue of having to trust the code.

    Hmm, what does “trusting passpack” mean? Does it not mean that one is willing to believe that it has no “bad” employees – certainly not the kind that would want to get hold of people’s passwords?

    Well, then, the security benefit offered by HPH is not needed! Why? Because, it order for the security offered by HPH to work, we already must trust passpack (‘s employees).

    “I think that people trust Passpack not just because we adopt HPH, but because we have demonstrated in this year and half that we are serious.”

    Agree :-)

  8. @Anonymous

    A dictionary attack is based on statistical used strings and string’s combination. Suppose you obtain in some way an encrypted Pack. Sure, you can try to attack it but the probability that you will succeed in a reasonable amount of time, is very low. The key’s bits are measured estimating the entropy and not based on size. For example, 1212121212121212121212121212121212121212121212 is only… 12-bit, while George, Luis and Peter is 114-bit. Essentially, requiring an almost 80-bit key force users to choose a non-common password countering dictionary attacks well. But I don’t want to just talk about Passpack.

    If you loose your password and a web site can send it to you, it means that the password is someplace where someone can obtain it. Yes, someone could copy a ciphered database and attack it. But this is very different from the case that someone can copy a database, can have the keys and decrypt it entirely in just a few minutes (and grab a lot of secrets).

    What I try to say is that an HPH system, at the same level of base security, is intrinsically more secure. Obviously, if the base security is low… the entire system is less secure.

    No company wants “bad employees” :) but there isn’t fool-proof way to know before-hand if an employee is honest or not. So if you don’t consider the risk, you would develop a system that a “bad employee” will bypass on the first occasion. A lot of banks have learned this lesson the hard way.

  9. Whether or not offline dictionary and similar password guessing attacks are going to be successful depends entirely on how users select their passphrases. In particular, on how much *entropy* they have – not how lengthy they are. While a short passphrase cannot have high entropy, a long passphrase does not necessarily have high entropy.

    [assuming the rest of the code is flawless].

    But all this aside; the point I am trying to make – and it seems to me that at least partially it has been taken – is that HPH does not buy the user anything substantial in terms of security, because while it raises the bar for a “bad” employee that tries to decrypt database entries (and we keep arguing about by how much it does this), it depends on there being no “bad” employee in the first place, that messes with the code (or code selection process).

    The average user – well, I will risk saying *no* user – will be really interested in making the distinction between the two types of “bad” employee.

    Instead, those users that trust the HPH site, do this because they trust the entity as a whole, and for the reasons you pointed out earlier. If there is going to be an incident, the whole site will suffer from it from the outsiders point of view – not just the “database department”.

  10. By the way, according to the National Institute of Standards and Technology (NIST), the password “George, Luis and Peter”, if chosen by the user, has merely 38 bits of entropy.

    reference: Table A.1 of

  11. I’m unable to add something to the conversation, but that’s a great post, that raised many interesting thoughts in my mind. :)

    The first thing I’ve thought is: is it possible to develop an “host-proof application framework” to build applications on it?
    Maybe something like that could help spreading those concepts.

    I mean… “Rails” was the internal framework of Basecamp… could “Proof” be the internal framework of PassPack? ;)

  12. @anonymous

    The algorithm used by Passpack to estimate the entropy of a key is based on Keepass quality measurer. Keepass is a great password manager. It is open source and there is an entire community devoted to its development. So I “trust” their approach to entropy estimation.

    Shannon’s approach assumes a character base of 27 and the use of the English language only. NIST, extended the character base assumption to 94 characters, but maintains the English language assumption. Passpack allows the full range of the Unicode character set (about 100,000 characters) and is used by people of many different languages.

    So suppose that a hacker (internal, i.e. the “bad employee” you like a lot, or external) obtains encrypted data, he can not know if the user chose a key in English, in Chinese, in Arabic or some other UTF8 lingual mash-up.

    Vice versa, to defend against a hacker that knows the target user’s language, to obtain an effective quality measure we would need to adapt the algorithm to the user’s preferred language, choosing a different base sets of chars. It is clear that if an English speaking person uses Chinese chars a dictionary attack becomes very hard, but if a Chinese person uses Chinese chars… it’s normal.

    So it is necessary an approach that don’t consider a specific language as reference to calculate the entropy, but a more generic approach… or in an ideal world, an algorithm that is contextualized on the single user.

    I think that Keepass’ approach is a good base, so I chose it for Passpack. Other systems can chose other algorithms.

    But this is not the central point of the discussion. Dictionary attacks are not exclusive to HPH – they are common to all systems including stand-alone apps and operating systems.

    Suppose that there are two systems, A and B, that implements the same level of security, they are audited by the same security authority, they use the same connection provider, etc. All of their assets are identical, but System A provides a HPH application and System B does not.

    Do you think that this makes a difference, or no?

    I have managed both HPH and standard systems, so I know that the answer is yes.

  13. @Folletto Malefico
    I’d love to see that happen. For starters, we’ve released a library on Google Code for those who want to build their own Host-Proof Hosting application.

    Did you have a specific project in mind?

  14. @sullof & @anonymous
    You’re both focusing on some very specific security and technical details.

    What about more general data privacy implications?

    Standard (non-HPH) systems maintain a database of users’ data, that data can be accessed, viewed or indexed at anytime — as a normal procedure.

    With HPH, that is exponentially harder to do, especially en-masse operations like data mining, and would likely require some sort of illegal action.

    So while HPH may not be the silver bullet for all privacy woes, I think it’s an excellent starting point.


  15. Ryan Michael

    A few comments, ordered according to the post:

    2) “To understand if the process is correctly implemented, you need only analyze the client-side Javascript code.”

    Technically, i agree that this is true. Unfortunately in actual practice it is unlikely that users will be capable or willing to inspect the javascript of the client code each time they use a site. This means that either the user must trust the application provider (as suggested earlier) or the user must trust some 3rd party’s analysis of the codebase.

    If the user is forced to trust the provider, i don’t see what the point of all the encryption is. In this security model the only role encryption plays is to secure the data on the provider’s servers in case the servers are compromised. There is still NO guarantee that the provider won’t quietly change their code to forward decryption keys. In other words, this Host-proof-hosting paradigm only works (in a pragmatic sense) if there is some way of publishing 3rd party code review and verifying that the downloaded code matches the reviewed code.

    “If you trust an application, by default you are trusting the providers behind it.”

    Partially. The reason open-source cryptography and security software is preferred is that it can be peer-reviewed. This means that users are not forced to trust *only* the application provider; they can trust the thousands of eyeballs reviewing the code on their behalf. The problem of verifiable 3rd-party review applies to this example as well, and is a problem that could use adressing.

    3) “By loading only the necessary code on-demand, we can optimize the user experience and overall speed of the application. ”

    I don’t see that this is in conflict with the idea of preventing code changes, so long as the initial download contains checksums for any additional code which is downloaded. Again, this is an issue of ensuring that the code running client-side can be reviewed and users can have some confidence that if the code they download matches *some* checksum, that the application works as it is intended to.

    4) “But… I connect to Clipperz and am greeted with a box that tells me I am connected from Rome, and a week ago I connected from Milan, etc.”

    From what I can tell, Clipperz usernames are hashed before being sent to the server, so all the server knows is access times, the amount of data downloaded and the IP it was sent to. I can’t imagine any client-server model that had access to less data. The argument that IP addresses can be used to idenitfy user’s location ignores the fact that *really* paranoid users can use TOR and eliminate any relationship between their IP and their actual location.

    In conclusion, I think that it is absolutely necessary to establish some distributed framework for code review and authentication. This extends to package managers for OS’s as well, as they are susceptible to many of the same exploits Passpack is. You’re right that it’s dangerous to view technology as a panacea, but technology can be used to enable communities to work together.

  16. @Ryan Michael
    Hello. Thanks for the comments. Two things:

    …it is absolutely necessary to establish some distributed framework for code review and authentication.

    I agree, and would love for that to happen. But in order to accomplish this the delivery would need to be checked at every delivery of the code. That’s where the call for a “better browser” came into play with the recent Clipperz/Stallman call to action. It’s not immediately practical. But I wonder if there could be other ways to solve the issue that don’t require building a new browser.

    they can trust the thousands of eyeballs reviewing the code on their behalf.

    In theory yes. I just finished reading a post by Savio Rodrigues where he hits on this topic. He examines the case of a Springwise security flaw, and raises the question:

    Two key benefits of OSS are the ability to read and understand the code we use and that “many eyes scouring the code” makes the product more secure.

    Considering the millions of downloads of the Spring Framework, should we have expected someone to discover these security holes earlier? Or do developers use what the next guy/gal is using, trusting that “someone” has done the due diligence?

    It’s a valid question.

  17. @Ryan

    Most people don’t use TOR. Not because they are not paranoid, but because they don’t realize that the IP address can identify them.

    I would like to see an international regulation of the matter. Europe, perhaps, will move towards something useful asking that IP need to be treated as personal information –

  18. Ryan Michael

    “In order to accomplish this the delivery would need to be checked at every delivery of the code.”

    Or cached locally, but I see your point and i agree that there is no ideal solution which exists right now. I think some of these problems (verifying checksums at every delivery) are likely simple problems to solve with browser plugins or trusted authenticating proxy servers.

    I also completely agree with you that as OSS moves further into the mainstream, we’re going to start seeing these types of oversights coming to light. I think it was less than a month ago they realized that *OpenSSL* had a serious flaw in its PRNG. I guess this is a value judgment, but my opinion is that I would rather take my chances on the OSS community than on a proprietary vendor; I don’t see how closing a project’s source code makes the project *more* secure, if anything it seems like a disincentive to the developer to invest in substantial security reviews.

    I’ve proposed this elsewhere before, I think the OSS community could benefit greatly from a centralized organization similar to SourceForge or FreshMeat dedicated to tracking software packages and hosting code reviews. Some type of system where reviewers or organizations sign checksums to attest that they have looked over the code and not found any potential problems. Reviewers could be volunteers or be paid by patronage. Users could then decide which reviewers they choose to trust (or at least see who has reviewed a package they’re interested in). Such a system would require little to no trust in service/application providers (especially in Host-Proof-Hosting scenarios). I guess everyone has a bright idea about this though…

    “Most people don’t use TOR… I would like to see an international regulation of the matter”

    That would be nice, but I’m not going to hold my breath. Regardless, without a TOR-like system, I don’t see any possible way for a client-server paradigm to operate without IP addresses. Maybe that’s your point.

    Thanks for the responses guys – this is a really interesting topic to me.

  19. Pingback: The Blog That Goes Ping » Blog Archive » Clipperz review

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s