The tedious truth is that most IdM projects must begin with a phase of data matching and cleaning. Before you can start to automate management of identities, you need a predictable data set around which to base your rules.



Many organizations today, whether due to changes in IT personnel, company mergers, variable naming conventions or lack of guidance on handling resignations, have existing user account bases that can only be described as a mess. This document covers some of the methods you can use with ILM/FIM Sync to get through that first project phase.

Unfortunately there are no magic bullets here – eye-straining, brain-numbing trawls through long lists of unmatched accounts cannot usually be avoided. What you can do is try to extract quality data, and construct your lists as helpfully as possible, so they can be targeted at the people whose eyeballs and brains are most likely to give the best response.

It must be added that this is a very big topic, and the methods you choose will depend on the data you are faced with, and the aims of your project.

Joins

Really, it’s all about joins. Once existing accounts are joined to their correct data source (such as matching an HR record to an AD account) you can begin to flow updates. Once you have a clear idea who does and does not have an account in a target directory, you can begin to make provisioning and deprovisioning decisions.

But be careful – your joins must be reliable! 
The last thing you want is to kill the credibility of your fledgling IdM project by updating someone’s account with another person’s details or, even worse, deleting their account because you thought they’d left!

Your eventual aim is a simple Join Rule

When you first start data matching, you will use a lot of different join rules, and you will make joins manually and with CSV files (more on that below). But keep this in mind:

Any join made manually, or with extra effort, should be considered temporary.

There are various situations in ILM which can only be reliably rectified by a clear-out and re-import of a connector space. Always plan for this. Ideally, when you re-import a connector space, you will have a single, direct join rule which effortlessly re-joins all your objects.

And to achieve this we use...

Breadcrumbing

Once the join is verified you should export a uniquely identifying attribute, such as an employee number, to the target directory. After that a simple “employeeID = employeeID” type join rule is all you will need.

 

 

Sometimes you are faced with an import-only system. For either technical or political reasons you are not able to export the breadcrumb attribute.
There are a couple of things you can do:

  1. Import the DN or other identifying attribute from the target directory into an attribute on the Metaverse object. But be warned, you could still lose these joins if you had to repopulate your Metaverse.
  2. To be on the safe side you should also “save” these joins by exporting an identifying pair somewhere else – for example using a database or text file MA, as pictured below.

 

 

Understanding Join Rules

There are some key points to note about Join Rules:

  1. Join rules operate on the CS object. It takes one CS object and attempts to find a match among all Metaverse objects of the correct type, even if they are already joined.
  2. Components within one rule are AND-ed together – they must all match.
  3. Multiple rules are evaluated top down, so put your strongest rules at the top.
  4. While you can offer variations on a CS object's attribute using an Advanced Join Rule, you have to find an exact match with an attribute already on a Metaverse object. There are no “StartsWith” or “Contains” comparisons.

Resolve Join

At the bottom of the Join Rule configuration you will see a check box “Use rules extension to resolve”. Here you can link to code you write under the ResolveJoinSearch subroutine in the MA extension.

The Resolve rule is used when ILM finds multiple possible matches in the Metvserve (including already-joined objects).  All the possible matches into the collection rgmventry and your job, in the code, is to check through them, looking for the best possible match. If your code finds an ideal match you return the index number of the object in imventry, and set the value of ResolveJoinSearch to true.

The following example only joins if a single, unjoined Metaverse object was found.

Public Function ResolveJoinSearch(ByVal joinCriteriaName As String, ByVal csentry As CSEntry, ByVal rgmventry() As MVEntry, ByRef imventry As Integer, ByRef MVObjectType As String) As Boolean Implements IMASynchronization.ResolveJoinSearch
 Select Case joinCriteriaName
  Case "Resolve_NotYetJoined"
   If rgmventry.Length = 1 AndAlso _
   rgmventry(0).ConnectedMAs("AD").Connectors.Count = 0 Then
    imventry = 0
    Return True
   Else
    Return False
   End If
 End Select
End Function

Advanced Join Rules

A simple join rule directly matches a connector space attribute to a Metaverse attribute:

  Connector Space   Metaverse  
  givenName = Kathryn   FirstName = Kathryn  
  sn = Bigalow   Lastname = Bigalow  

With an Advanced Join Rule you construct a list of possible values with which to find an exact match in the Metaverse:

  Connector Space   Metaverse  
  givenName = Kathryn
givenName = Kate
givenName = Kathy
  FirstName = Kate  
  sn = Bigalow   Lastname = Bigalow  

Sometimes you can use code rules to make your list of possible matches (eg., presenting a phone number in different formats); other times you have to use long look-up lists of possible variations (try genealogy websites for name-variation lists).

The following example uses a lookup file of aliases, where each line has the possible variations on a name.  If a match to the first name is found on the connector space object, the whole line is added to the possible values to search for in the Metaverse.

Elizabeth,Liz,Beth,Betty
David,Dave,Davey
Jerome,Jérôme
...

 

Public Class MAExtensionObject
 Implements IMASynchronization

 Dim fileAliases As System.IO.StreamReader
 Dim arrAliases As String()
 Dim i As Integer

 Public Sub Initialize() Implements IMASynchronization.Initialize
  fileAliases = New System.IO.StreamReader("C:\aliases.txt", System.Text.Encoding.Default)
  i = 0
  While Not fileAliases.EndOfStream
   ReDim Preserve arrAliases(i)
   arrAliases(i) = fileAliases.ReadLine
   i = i + 1
  End While
  fileAliases.Close()
 End Sub

 Public Sub MapAttributesForJoin(ByVal FlowRuleName As String, ByVal csentry As CSEntry, ByRef values As ValueCollection) Implements IMASynchronization.MapAttributesForJoin
  Select FlowRuleName
   Case "Join_aliases"
   Dim aliasList As String
   Dim value As String
   For Each aliasList In arrAliases
    If aliasList.Contains(csentry("givenName").Value) Then
     For Each value In aliasList.Split(",".ToCharArray)
      values.Add(value)
     Next
    End If
   Next
  End Select
 End Sub
End Class

Using CSV Files

A lot of the difficult account matching will probably be done outside ILM. It is therefore useful to be able to “export” lists of possible matches, and later, “import” the joins from a CSV file.

Be careful when writing to files from extension code – the DLL doesn’t unload for five minutes after the MA run completes, which means you may have to wait for it to finish writing to the file. If you’re in a hurry, recompiling the code will force the DLL to finish writing to the file.

Export Possible Matches to a CSV file

You can use the Resolve rule to export possible matches to a CSV file. The following example resolves the join rule “sn | Direct | lastname”. As a match on the last name alone is too weak for an immediate join, we just write the possible matches to the text file.

Public Class MAExtensionObject Implements IMASynchronization

 Dim fileMatches As System.IO.StreamWriter

 Public Sub Initialize() Implements IMASynchronization.Initialize
  fileMatches = New System.IO.StreamWriter("C:\possible matches.txt", System.Text.Encoding.Default)
 End Sub

 Public Sub Terminate() Implements IMASynchronization.Terminate
  fileMatches.Close()
 End Sub

 Public Function ResolveJoinSearch(ByVal joinCriteriaName As String, ByVal csentry As CSEntry, ByVal rgmventry() As MVEntry, ByRef imventry As Integer, ByRef MVObjectType As String) As Boolean Implements IMASynchronization.ResolveJoinSearch
  Select Case joinCriteriaName
   Case "Resolve_Lastname"
    Dim MAName As String = csentry.MA.Name
    Dim mvobject As MVEntry
    Dim cFirstname, mFirstname As String
    If csentry("givenName").IsPresent Then 
     cFirstname = csentry("givenName").StringValue 
    Else 
     cFirstname = "UNKNOWN"
    End If
    If mvobject("firstname").IsPresent Then 
     mFirstname = mvobject("firstname").StringValue 
    Else 
     mFirstname = "UNKNOWN"
    End If

    For Each mvobject In rgmventry
     If mvobject.ConnectedMAs(MAName).Connectors.Count = 0 Then
      fileMatches.WriteLine(csentry("sn").StringValue & ";" _
      & cFirstname & ";" _
      & mvobject("lastname").StringValue & ";" _
      & mFirstname)
     End If
    Next
    Return False
  End Select
 End Function
End Class

You will need to adapt this code of course. Firstly, you probably want to export a lot more identifying information in your CSV file – department, email address, dn ... whatever helps. Next, it can really help to supplement your possible matches with a probability score. This is where you do a series of tests and add points- the more points, the higher the chance of the match. For example:

  • Names similar*          +1
  • Department the same +1
  • City the same            +1

*Some tips for testing if names are similar:

  • Strip out all spaces, dashes and hyphens then compare;
  • Check if one string is contained in the other (so that “Sally-Anne” gets a point for “Sally”);
  • Use a function which compares string similarity (search “Soundex” and “Levenshtein distance”).

 

Join from CSV

You can use an Advanced Join Rule to “import” joins from a CSV file.

Firstly, our CSV file must be constructed like this:

CS_identifier;MV_identifier

For example, if we are trying to match an AD account against Metaverse objects imported from the HR system, we populate the CSV with the AD DN and the employeeID:

CN=Fred Bloggs,OU=User,OU=MyOrg,DC=mydomain,DC=com;0012988

Next we create the Advanced Join Rule which will look up the csobject’s DN in the text file, but use the employeeID to search the Metaverse.

Note that you can’t actually use the DN in the join rule – but that’s ok, just use any attribute that definitely exists. Eg.,

sAMAccountName | Rules extension – Join_CSV | employeeID

And now for the code :

Imports Microsoft.MetadirectoryServices

Public Class MAExtensionObject Implements IMASynchronization
 Dim joins As String()
 Dim i As Integer

 Public Sub Initialize() Implements IMASynchronization.Initialize
  Dim fileJoins As System.IO.StreamReader
  Dim strLine As String

  'Open the csv file and read into an array
  fileJoins = New System.IO.StreamReader("C:\joins.csv", System.Text.Encoding.Default)
  i = 0
  While Not fileJoins.EndOfStream
   ReDim Preserve joins(i)
   joins(i) = fileJoins.ReadLine
   i = i + 1
  End While
  fileJoins.Close()
 End Sub

 Public Sub MapAttributesForJoin(ByVal FlowRuleName As String, ByVal csentry As CSEntry, ByRef values As ValueCollection) Implements IMASynchronization.MapAttributesForJoin
  Select FlowRuleName
   Case "Join_CSV"
    'If the csentry DN is found in the joins array, then
    'use the paired employeeID to search the Metaverse.
    For i = 0 To joins.Length - 1
     If joins(i).Contains(csentry.DN.ToString) Then
      values.Add(joins(i).Split(";")(1))
     End If
    Next
  End Select
 End Sub

 

Reporting

People will ask you questions like “How many people have you joined in system X but not in Y?”, “How sure are you that the joins are correct?”, “Which department has the most unidentified accounts?” It’s best to be prepared for these sorts of questions.

Querying the Metaverse

Once data is in the Metaverse it is a simple matter to access it for reporting, either by exporting it into a reporting table, or by directly querying the underlying tables (as long as you’re careful to do it when ILM is idle, or else use NOLOCK).

So consider this: During the data cleaning phase, import all identifying attributes into the Metaverse from all sources.

For example: You’ve made a join between a user in AD and an HR record. Under normal operations you would consider HR as the master source for the name, and you would only flow it from there. You wouldn’t bother importing the name attributes from AD – in fact you’re more likely to be overwriting them with export flow rules.

  HR       Metaverse       AD  
  Lastname = Powells-Brown   ->   lastname = Powells-Brown   ->   sn = Powells-Brown
  Firstname = Joanna   ->   firstname = Joanna   ->   givenName = Joanna  

However, during the data matching phase, you’re probably not ready to start overwriting attributes, and the information about current values can be very important in your verification and reporting.

  HR       Metaverse       AD  
  Lastname = Powells-Brown   ->   HR_lastname = Powells-Brown
AD_lastname = Powells
  <-   sn = Powells  
  Firstname = Joanna   ->   HR_firstname = Joanna
AD_firstname = Jo
  <-   givenName = Jo  

Now, if you have a look at the mms_metaverse table in the ILM database, you will see how simple it is to query the progress of your joins, and also to judge on what criteria the joins were made.

Some example queries…

/* HR person with no join to AD */ 

select HR_lastname, HR_firstname, HR_employeeid from mms_metaverse 
with (nolock)

where AD_dn is null
 /* HR person with join to AD */ 

select HR_lastname, HR_firstname, HR_employeeid, AD_lastname,AD_firstname,AD_dn from mms_metaverse 
with (nolock)

where AD_dn is not null

 

Caution Caution
As mentioned above you need to be careful when directly querying the Metaverse tables.
If your system is already in production, and you happen to be adding in a new data source, then you may be better off employing a SQL MA to export the data you’re interested in to another table, where you can query it as much as you like without risk of locking errors.

 

Querying the Connector Space

Unfortunately it is not so simple to query the connector space to, for example, report on that state of your disconnected objects, but here are a few ways you can export the data:

SQL query - The CS table holds data differently to the Metaverse tables.
It is possible to query for disconnectors in this way, however you will only be able to retrieve the CN of objects – which may not be sufficient to identify them.

select cs.rdn from dbo.mms_connectorspace cs 
with (nolock)
join dbo.mms_management_agent ma with (nolock)
on cs.ma_id = ma.ma_id left outer join dbo.mms_csmv_link mv with (nolock)
on mv.cs_object_id = cs.object_id where ma.ma_name = 'My MA' and mv.mv_object_id is null and cs.connector_state = 0

CSExport - The command-line utility csexport.exe, found in the <ILM program>\bin folder, will allow you to dump connector space objects to an XML file.

Search Connector Space

Report directly from the data source - Once an object in the data source has been correctly identified you will ideally export a unique attribute out to it.
It may then be possible to identify the non-joined objects as those which don’t possess this attribute.

 

Project to a different object type - ILM processes joins before projections, so it is fairly simple to project all non-joined objects to a different Metaverse object type – for example, one called “disconnectors”.
This may help in reporting on an overall status direct from the Metaverse tables.
Note however that to make these objects once again available for joins they will have to be disconnected from the MVExtension provisioning code.

 

Advanced Tips and Tricks 

Join to Multi-value Attribute

Sometimes you may need to join to a value in a multi-value attribute. An example is searching through all the proxyAddresses for a match against a single email address.

When the multi-valued attribute exists in the connector space, and the single valued attribute is in the Metaverse, this is very easily accomplished. Just use an Advanced Join Rule to break the multi-valued attribute down into the values list used by the join rule.

Public Sub MapAttributesForJoin(ByVal FlowRuleName As String, ByVal csentry As CSEntry, ByRef values As ValueCollection) Implements IMASynchronization.MapAttributesForJoin
 Select FlowRuleName
  Case "Join_proxyAddresses"
   Dim alias As String
   For Each alias In csentry("givenName").Values
    values.Add(alias)
   Next
 End Select
 End Sub

However, when the multi-valued attribute has already been imported into the Metaverse you will have a problem. While you can join on a multi-valued attribute, you have to join on the whole thing. There is no way that you can match one value out of a multi-valued Metaverse attribute against a single-valued connector space attribute.

Some possible options:

  • Import the multi-valued attribute into a series of single-valued attributes in the Metaverse, eg., proxy1, proxy2, proxy3 …
  • Use a different Metaverse object type to do the project and join the other way around.

Multiple Possibilities in the Connector Space

All the joining techniques so far have been based around a single connector space object, with one or more possible matches in the Metaverse. But what do you do if you want to work the other way around, where there are multiple possible matches in the connector space for a single Metaverse object, and you want to pick the best one? Unfortunately this is not straight-forward. There is no way to offer a selection of connector space objects in a join rule, as joins always work on a single connector space object at a time. You can judge the merits of the current CS object, but you can’t tell if there’s a better one coming up. One way around this is to do everything with CSV files.

  1. Use the ResolveJoinSearch to write possible joins to a CSV, but don’t actually join anything,
  2. Do your matching outside ILM, then
  3. Use a CSV with an Advanced Join Rule to make the joins.

Too much data

If you have an enormous number of accounts, and different data sources to trawl through, you may be best off doing your data matching outside of ILM, and then just using the CSV join rule above to make the joins.

Check out the Fuzzy Lookup Transformation from the Enterprise version of SQL SSIS for help here.

 

Take-Home Thoughts

  • Complex join rules are a means to an end – not the end itself.
  • Breadcrumbing is essential for automation.
  • When matching on weak rules (eg., surname only) then verify the match another way.
  • You can’t do it all in ILM. CSV, Excel and fuzzy lookup algorithms will also help, but an element of by-hand matching is inevitable.
  • Get the matching and breadcrumbing sorted out before you start flowing and provisioning. This will make for a happier project, stake holders, users and YOU!

 

Extra Reading