SCIMming the Surface
13 Apr 2012SCIM seems to be a new specification with ambition to succeed where SPML have failed. The effort of SCIM seems to more realistic and practical, yet it is still struggling with similar issues as SPML. As an architect of midPoint I'm looking at SCIM from the point of view of a potential implementer and also partially as a researcher. Here is a list of issues that immediately stuck me when I was reading the core schema specification of SCIM:
- There is externalId attribute for user. It may seems as a single attribute but it is not. In fact "The Service Provider MUST always interpret the externalId as scoped to the Service Consumer's tenant". Which means that the provider needs to store one value for every client. This is an extra state that has nothing to do with the provider itself. It is transfer of client's responsibility to server. Wrong application of separation of concerns principle. This is not even made optional. Therefore it will complicate all the server deployments, regardless if it is necessary or not.
- It looks like change of user's userName is not supported. This seems quite limiting to have two persistent identifiers for users (id and userName). Also, username changes are very common. If username is based on familyName it changes after most wedding for approximately half of the population.
- The familyName and givenName attributes have culture-neutral names. This is a nice take from FOAF. But the middleName is not that good. It enforces "american" point of view to the schema. Maybe "additionalName" would be more appropriate.
- User has displayName and also name/formatted attributes. It seems like these two are used for the same purpose? Or maybe it is displayName and userName? It looks like SCIM is following LDAP and SAP anti-patterns where users have just too many names to choose from. It is perhaps good for the entity that displays them, but terrible for the one that needs to manage that. The protocol should be more balanced in this aspect.
- User has nickName as a top-level attribute. But isn't a "name" complex attribute a better place for nickName? Especially considering the fact that nickname is frequently formatted as a part of full name.
- Does profileUrl represent application profile maintained by the application that is being provisioned? Or some other external profile? Should be "profileUrl" multivalued? The specification is not clear about that.
- User has title and userType in the core schema. But these seem to better fit into the "enterprise" extension.
- User has phoneNumbers, but no canonical phone number format is specified. This is limiting the usability of the specification especially in telco environment.
- The type meta-attribute in the multi-valued attributes is a plain string. This is prone to conflicts, especially in ims and similar "open" attributes. URL instead of plain string may be a better choice.
- And probably the most important one: Both groups and roles can be considered entitlements. The groups attribute is read-only, but it can be manipulated through Group Resource. Should such group also appear in entitlements user attribute? If it cannot than the correct name of that attribute should rather be "otherEntitlements" and the specification should make that clear. If it can then we have a redundancy: a group can be manipulated both through Group Resource and "entitlements" attribute. Similarly for roles. The specification does not specify if a role should only appear in "roles" and not in "entitlements" or can appear in both. The "SCIM Group Schema" also defines that roles may be represented as groups, which adds to the confusion. The ignorance of complexity of entitlement management was one of the time bombs in SPML. Not it is time bomb silently ticking in SCIM.
- The members attribute in group does not scale. Groups with thousands of members are very difficult to manage in this way. And it is typical that a group such as "Generic Employee" have more members than that, not even speaking about telcos. This is one of the common problems in LDAP and also in SCIM. There is also a corollary: creating a user as a member of a group requires two operations: add user, modify group. This complicates the implementation in case that the second operation fails. Should a provisioning system report that as a failure or success? User is created but not assigned to a group. Good provisioning system should handle that, but how many good provisioning systems are out there?
- Minor issue: Canonical types for members are capitalizes while other types start with lowercase letter.
- Should not authenticationSchema be outside of the SCIM core schema? Schema defines no transport protocol and the authentication types clearly depend on transport protocols. Maybe a binding specificiation is a better place for authenticationSchema definition?
- Resource schema has name attribute that obviously points to the object type. But it seems to be plain string. Namespace is not obvious here. If any SCIM extension adds a new object types (which seems likely) this may be very confusing. URL may be a better choice here.
- User has userName, displayName and name/formatted. Group has displayName. Resource has name as string. It is confusing. It is also quite inconsistent and makes if difficult to support uniform representation of "objects" in the underlying SCIM implementation.
- Resource Schema has description, but user and groups does not? Description may come handy in any object.
- Can endpoint in the Resource Schema be only relative? Or may it be absolute? Base URL is not a good concept, especially when it comes to different representations of the schema (e.g. see here).
- The attributes/type definition in Resource Schema does not specify whether the value is URL or QName or plain string. If a plain string (which seems to be the case by looking at the examples) how to map that string to XSD QNames? Are only XSD data types possible? The specification says "SHOULD not" not "MUST NOT", therefore an extension mechanism should be specified here.
- The multiValuedAttributeChildName attribute and associated way how to represent data in XML seems to add to the redundancy of the data format. Strictly speaking, this one is also quite specific to XML and should not be in the generic core schema.
- When defining an attribute in a resource schema, how is attribute schema used? Is it a namespace that applies to attribute type? Or to the attribute name? Attribute has schema and sub-attribute does not? Why? Does it inherit it from parent? The specification should make that clear.
- Sub-attributes cannot have sub-attributes?
- If the type is mandatory for every multi-valued attribute (is "hardcoded" in the core schema specification), is there any point to define it explicitly in all the resource schemas?
- I have notices meta attribute in the examples, but it looks like it is not defined in the specification.
- Is ordering of multi-value attribute values significant?
- Critical problems in JSON-like representations: as there are no namespaces in JSON, naming conflicts can happen. If two schemas define a "employeeNumber" attribute while one of them defines it as string and the other as number, such schemas cannot be used together. Is this a known limitation of SCIM?
Overall I perceive SCIM as an effort in the very early state of development. It is also a typical example of premature standardization anti-pattern. That anti-pattern is seen way too often and gives us marvels of software engineering such as CORBA and the WS-* stack. I hope that the authors of SCIM will try to correct the obvious problems of the specification and focus on proving that it works before going any further. The only reasonable way to go is: working software first, standards second. If tried the other way the result will be yet another incarnation of SPML. You know what is the most delicious piece of SPML specification? The SPMLv2 schemas do not pass even a simple XSD validation. I hope SCIM will not reapeat such mistakes.