With so many software vendors and frameworks, it is often not worth spending time on developing custom software. Often time buying, customizing or integrating with off-the shelf products or libraries is a cost effective solution. That's what I thought when I first time heard of the problem, let's just buy a Red Gate product. It turned out that we need to build our own and here is why.

Problem

Suppose there is a monolithic legacy WPF/SQL Server application. The SQL Server databases are not exclusively used by that app, there are also services of different kinds, internal line of business applications, externally faced web portals, ERP data pulls. A lot of stuff. Certain columns in the databases are critically important. They are vital for important financial calculations and reporting. When values in those columns get out of wack, things can go really bad. Multiple people and teams make changes, but for the most part the code is monolithic and database is shared. It's not a completely unusual architecture, I have seen it in more than one place. What we need it this:

  1. Identify and document critical and important database columns and categorize columns ("these are Portfolio related columns")
  2. Track dependency among those column and other system components which include stored procedures, tables and C# code.
  3. Warn developers when software changes that are being made will be affecting critical database columns

Essentially we want to

  1. Discover all SQL Server data objects and their dependencies and create a data model that can be queried
  2. Augment data model with custom attributes and additional relationships

Here is where additional relationships come into play. Suppose we have a list of important columns. There are stored procedures that touch (update) those columns. We want to establish dependency between stored procedures and the columns so that if we find stored procedure call in the source code, we can query our data model and determine if that stored procedure touches important columns and if it does, take actions (warn developer, enforce code review, prevent deployment and so on). The core of it is to have some system that can answer the questions if this stored procedure or columns access can wreck havoc.

It turns out that establishing a link between stored procedures and columns is tough. I am not aware of any tools on the market that do that. I think it is impossible do reliably automatically, in a tool. We can parse out TSQL with SQL DOM and get syntax tree that can be analyzed, but this will work in only simple one-table update cases.

Solution

My data dictionary or dependency tracker prototype has two parts - data collection and visualization / editing. The data is collected by using SMO and then added to editable digraph. UI is fairly simple Angular single-page app which looks as follows.

Once a table is selected, the related tables are shown. The dependencies can searched both upstream or downstream:

data-dictionary-table

Additional custom attributes can be added to any column:

data-dictionary-column

There is an ability to add columns to a stored procedure:

data-dictionary-sproc

Once a stored procedure is associated with a column, this stored procedure shows up in column dependencies:

data-dictionary-sproc-dependency

After graph is customized, it can be saved to database. The new database metadata pull will add new SQL Server objects but won't change graph customization(custom column properties, comments and relationships).

What this all allows to do is to have a repository of SQL Server objects that can be queried through an API and be integrated with another tools. See source code here.