Auth all the things!?! Identity technologies to prevent a privacy decline

"AI is the path of the dark bots. AI leads to fake, fake leads to auth, auth leads to surveilling." --Yoda

The advances in generative AI technologies have been nothing short of remarkable. From language processing to image and video synthesis, these technologies are opening new possibilities that not too long ago felt like science fiction. However, with this innovation comes a significant challenge: how to counter malicious use of AI by threat actors. As the technology becomes better and widely available, the quantity and quality of spear phishing, deep fakes, misinformation and other malicious activities increases. It is already nearly impossible to distinguish between humans and AI generated content. The renowned New Yorker internet dog cartoon could use an update: “On the internet, nobody knows you are an AI-powered dog”.

An AI-powered dog, generated by the Bing Image Creator

It is safe to assume that this situation will lead to a push for increased authentication across our online experiences, to verify the authenticity, integrity, and origin of content, and to verify the humanness of participants. The cryptographic toolbox is full of techniques to help with this, but continuous and inescapable personal and content authentication using conventional schemes would leave a digital wake leading to an unprecedented decrease in privacy, while facilitating generalized and automated surveillance. This is not how we interact in real life, and this is not how we have learned and are expecting to interact online, especially given recent privacy and data minimization trends in legislation and many services by industry; no one wants to live in a panopticon.

Privacy-enhancing identity technologies

Thankfully, we are making good progress in developing privacy-enhancing technologies that could help balance security and privacy in user and data authentication systems. We have been developing such technologies at MSR for a long time, and many are reaching a maturity level that makes them ready for standardization and wide adoption.

Selective disclosure of attributes

One important property in privacy-preserving identity systems is minimization of the data presented in a particular interaction. Identity credentials typically contain multiple attributes (a.k.a. claims), and we don’t necessarily need to disclose all of them in every presentation. This is not really a problem when presenting our paper credentials offline: the person checking my ID at the bar to see if I’m allowed to drink doesn’t have a photographic memory and won’t remember my address, but presenting an equivalent credential online could easily result in data misuse even for well-intended recipients.¹ Federated protocols (such as SAML and OpenID/OAuth) aim to minimize the disclosed information by requesting only the required attributes on demand from an issuer, but this reduces the user’s autonomy and privacy vis-a-vis the issuer (while also increasing the load on the issuer’s system). There are cryptographic mechanisms to selectively present a subset of certified attributes encoded in a long-lived token, while preserving the integrity of the issuer’s signature. One hashed-based mechanism has notably made its way in the mobile Driver License ISO standard. The technique is also being standardized for JSON Web Tokens (JWT) (the most popular ID token format today) in the OAuth Selective-Disclosure JWT working group; you can experiment with the concept with my TypeScript library implementation.

Unlinkability

Another important and perhaps more subtle property is the unlinkability between the issuance and presentation of the credentials, to prevent undesired tracking of the users activities. Conventional signatures create unique identifiers that can be used as correlation handles to link user activities. There are two mechanisms to break this bond: you can randomize either the issuance or the presentation of the credential using special cryptographic schemes.

One of the most promising signature schemes for randomizing presentations is Boneh-Boyen-Shacham (BBS). We are developing the specification in the Decentralized Identity Foundation BBS signatures working group. We at MSR have a TypeScript Node implementation of the algorithm that has just been updated to implement the 2nd draft submitted to the IETF. I’m excited by future improvements to the current specification following the results of this EuroCrypt paper by University of Washington’s Stefano Tessaro and Chenzhi Zhu, who analyzed the specification and proposed some optimizations.

Techniques for randomizing credentials at issuance include so-called blind signatures. These also made their way to standardized protocols, for example in privacy pass allowing the issuance of anonymous tokens redeemable to avoid answering CAPTCHAs.

New U-Prove release

One pioneering technology that we developed at MSR is U-Prove, which has a long academic history and has been prototyped and piloted in various systems. U-Prove provides both unlinkability of signatures and selective disclosure of attributes,² and unlike some more recent cryptographic schemes, it uses simple, easy-to-implement mathematical building blocks, which provides high performance.³

I’m happy to announce the release of a new U-Prove Typescript library and a new JSON framework, which makes it easier to integrate the technology in web applications. The framework can be used directly to realize various privacy-preserving scenarios, or be integrated in higher-level frameworks such as JSON Web Proofs (JWP) and Verifiable Credentials (VC). I’ll cover the U-Prove features and interesting use cases in future posts.

Parting thoughts

I’m very excited by the promises of the novel AI systems, and I’m optimistic that these emerging cryptographic tools will help balance security and privacy in face of these upcoming AI misuses. Now let’s get to work and start building…

AI tools were used to draft part of this text. No AI was harmed in the making of this post, at least not to my knowledge.

Footnotes

The data could be hacked, leaked, stored forever in databases and logs, sold to 3rd parties. ↩
U-Prove supports more powerful disclosure techniques, such a proving a property of an attribute instead of disclosing it (e.g., proving that my name isn’t listed in a revocation list, or that I’m over-21 without disclosing my date of birth). See the U-Prove extensions paper for details. ↩
U-Prove uses standard elliptic curves, by default the NIST prime curves used in ECDSA. ↩