Metadata Parser

Application developed to gain control of large data storage areas

As data storage sizes grow, the ability to retain control of the data stored within them becomes increasingly difficult. Over a prolonged period of time data structure, including folders and files, evolves and businesses change legacy data is often left and forgotten.

There are two types of tools currently available in the market to support mapping data stored on large storage areas such as file servers. The first are standalone tools which are easy to deploy and generally effective but often struggle at the scale of terabytes of data. The second tool type are enterprise-scale file management applications, which provide vast functionality when deployed across a business but are very expensive and require considerable installation and management.

Recognising a development opportunity, I have designed and written (using C#) an application that enables the collection of targeted file metadata at an average speed of 1TB per hour. This application has been highly optimised for collection speed alone. Use cases have included parsing 60 million files from 65TB of data in 4 days.

Once a scan has completed it is then possible to either query the data immediately or, based on the requirements, pass the data to visualisation software to enable an interactive review of the collected data.

The result of such a review empowers data owners, within a short space of time, to regain control of large data storage areas and identify appropriate further actions where required.