Identity Garbage

Recent discussions are about whether push or pull is the right model for future identity management. Unpractical standards are being revived. Everybody discussing the technology, the future, the visions. There is almost no discussion about the most difficult current problem of identity management: data.

There is (at least) one critical problem with implementation of single sign-on, identity federation, provisioning to the cloud and other fancy buzzwords. The problem is user database. It is not that difficult to deliver the information that someone should have access somewhere using whatever push, pull, standard or proprietary method - as long as you have that information. The reality is that enterprises does not have that information in a usable form. It is almost always distributed in several data stores, usually provided in incompatible formats, it is often inconsistent and sometimes even contradictory. And it is far from complete. Many provisioning cases are solved by non-algorithmic methods, e.g. manager or security officer deciding whether the request "looks valid enough" to be approved. The current situation is best described as semi-organized chaos.

How could anyone build an automated, Internet-scale, cloud-enabled and standards-based identity management mechanism on top of that? Hardly. Such project will most likely fail. But it will waste a lot of time and money before it fails.

The first step is to consolidate the data. Build a consistent user database, align policies, design business processes that can support 80% of cases with 20% of effort. It is naïve to expect that everything could be automated, therefore prepare for a reasonable amount of exceptions and human interactions from the very beginning. Single sign-on, identity federation, support for the cloud (whether push or pull) and even the standards will not provide any considerable help in that. It is mostly manual work of security staff, business people and engineers that is needed.

What can help is a well-designed and well-deployed provisioning system. In contrary to the popular beliefs the provisioning system is not really about provisioning. Yes, provisioning is a important part of the system, but other aspects are in fact much more important. Provisioning system can take data from several sources, covert them to a common format and merge them. Therefore it can create a unified database. Provisioning system can compare data among several system, correlating them, therefore detecting the inconsistencies. Provisioning system supports workflow and human interaction to clean up the data and supplement missing information. Both during initial migration and (most importantly) during the day-to-day operation.

Reasonable identity consolidation project including a decent provisioning system is a necessary pre-requisite for any other identity-related activity. It is a shame that engineers forget the Garbage In, Garbage Out phrase that was popular few decades ago. If the data are bad, any system built on top of such data can only be worse.