Non-anonymous Anonymity

I was working on a kind-of-directory-kind-of-SSO project for past three months. I was working quite hard and had no time to follow the identity buzz around. Just few days ago I found the entry in Kim Cameron's blog that features a recording of his conversation with Craig Burton. One of the topic was anonymity, especially the question if anonymity is an empty set of claims or not. After a while it came to me that the question is all wrong. More exactly, the "anonymity" is all wrong.

First of all we usually see anonymity as a boolean quality. You are either anonymous or you are not. You cannot be "a bit more" anonymous or have "quite a big" anonymity. But if you see anonymity as a boolean value, you must first define the "world" that it operates on. This is called anonymity set by some researchers. The exact definition may be found here:

Pfitzmann, A., Kohntopp, M.,: Anonymity, Unobservability, Pseudonymity, and Identity Management A Proposal for Terminology, Designing Privacy Enhancing Technologies, International Workshop on Design Issues in Anonymity and Unobservability, 2000.
The anonymity set is a collection of all possible subjects that you can choose from. For example when evaluating the anonymity of a single HTTP access the anonymity set may be "all IP-addressed devices" or "all devices accessing Internet form a single proxy server". In the former case the HTTP access is not anonymous, as it is identified by an IP address (the fact that the address may not uniquelly identify the client does not really matter). In the later case the access may be anonymous.

Or you may define anonymity as a quantitative value, measured by the size of the smallest applicable anonymity set. That way you may be "very anonymous" or "just a little bit anonymous". But in that case there's a new question: How much anonymity is enough?

One way or another, talking about anonymity without defining the anonymity set has no point. And I think that definition of anonymity set for the Internet may not be that easy. And will probably be very dynamic, anyway. Will we see the anonymity set as a collection of all Internet-enabled devices? Will we also include devices hidden behind masking proxies? Or will we see it as a set of physical users and it will not matter if a device is identified as long as a user is not? And if I can identify e.g. user's location (city) is the user still anonymous or is he not?

I think that the "anonymity" and "identity" are two extremes of quite broad and multi-dimensional identitifcation spectrum. And I also think that these two extremes cannot be reached in practice. But that may be a topic for following posts.

Maybe we should abandon the words "anonymity" and "identity" completely, as they may be very misleading. Especially while building practical systems.