Smart Match
09 Mar 2016In identity management there is a class of petty issues that appear and re-appear all the time. Even though these issues are easy to understand they are tricky to completely eliminate and they often have very nasty consequences. These seemingly unimportant issues frequently result in nights spent resolving a total breakdown of IDM system. What is this devil that kills sleep and keeps engineers away from the families? It is the deamon of case insensitivity and his friends.
It works like this: IDM system often maps its own username to the usernames used by the target application. The IDM usernames are often clean and pretty alphanumeric lowercase strings such as 'foobar'. Many applications are perfectly happy with that. But there are some notorious systems that insist that lowercase is no good and that uppercase is the only proper case. So they silently transform 'foobar' to 'FOOBAR'. Now, these are two different strings for an IDM system and that's where trouble begins. The solution of a naïve IDM system is to always treat the username as case-insensitive. But that won't work. There are systems that strictly insist on case-sensitivity. E.g. for a UNIX system 'foobar', 'fooBAR' and 'FOOBAR' and three very different identifiers. That leads to even worse trouble if the IDM system fails to recognize that, e.g. it is quite easy to get mass duplication of UNIX accounts. The naïve solution will not work here.
This is getting even more complicated. E.g. LDAP distinguished name (DN) is quite a slippery beast. It is (usually, but not always) case-insensitive. But it also has internal structure that tolerates white spaces. E.g. the 'cn=foobar, dc=example, dc=com' and 'cn=foobar,dc=example,dc=com' are equivalent. There are similar rules also for other formats. E.g. URIs 'http://example.com/foo%20bar' and 'http://example.com/foo+bar' are equivalent. Obviously, there is no simple solution to this little problem with a nasty head. And really bad things are bound to happen if the IDM system fails to recognize that two identifiers are in fact the same. What is even worse these issues are often overlooked at the beginning when the IDM system is tested and deployed. It is only at the time when the system is filled with data and the real operation begins that the disaster strikes.
MidPoint has a solution to these issues for many years already. It is called matching rules. Simply speaking matching rules are little algorithms that compare values. These algorithms can be attached to individual attributes. Then midPoint knows that 'foobar' and 'FOOBAR' are in fact the same thing. This makes the operation of midPoint reliable even if the connected applications are doing crazy things with the data. The matching rules can also normalize the value so midPoint can do efficient large-scale searches and matching. These are very useful little thingies. And they are essential for reliable operation of any IDM system.
So, what is so interesting about all of this if there is already a solution that works for several years? Well, quite a lot. The matching rules are not very easy to configure. They are the little things that the engineer always forgets to configure until all these huge chunks of data are migrated into IDM system. And the duplicities slowly (but persistently) start to appear. But at that point it is quite late to configure the matching rules as the data needs to be re-normalized and re-evaluated. This made the use of matching rules quite tricky.
The curious part about this story is that many systems can actually tell that the value of a certain attribute is case-insensitive or that it is DN or UUID. And identity connector can easily detect that. Yet the original Identity Connector Framework developed by Sun Microsystems had absolutely no means how the connector can tell that to the IDM system. This was simply insane. This insanity was fixed in the ConnId project right now. And it is already supported in the midPoint development code. Smart connectors can detect value subtypes and midPoint will automatically determine matching rules based on that. No need for explicit configuration. This is one more nasty tricky thing that is going to be eliminated in midPoint 3.3.1 and 3.4. And this is how midPoint continually improves its practical usability and deployment efficiency. MidPoint is indeed built to make engineer's life easier.
(Reposted from Evolveum blog)