Data Transformation

We Improved turnaround time and enhanced efficiencies for a leading US publisher using intelligent data acquisition and
content management system.

This case study describes how we helped one of our client who are into publishing business in their data transformation services by minimizing the data acquisition cycle-time, reducing data-to-information turnaround time, enhancing production efficiency and optimizing costs. This enabled them to reassign their core resources from regular production activities to critical business tasks and innovate new products.

The client, a publishing company based in US, is a leading provider of compliance information, training and in-depth data. They publish hard copies of books and also provision information in digital medium such as website and mobile apps. The information they publish primarily focuses on industries closely regulated by the federal government including associations and nonprofits, federal, state and local governments, financial institutions, etc. This required them to pull data from various publicly available online data sources and transform them into a structured information before publishing them.

The data transformation services inherently has number of challenges throughout its processing lifecycle related to volume of data, format inconsistency, data errors, etc. As the client ventured into new information area their challenges only multiplied. When we engaged with the client they had following key challenges apart from many other

  • Ability to scale in terms of collecting information from myriad data sources with disparate data formats and converting them to a common structured format
  • Composing the information for delivery through different channels. Their book composition was taking 1 to 2 months to complete with heavy manual intervention to ensure correctness of the information.
  • Mechanisms to keep the information up to date

We initiated a BPR activity to understand their current process and also to understand the various data sources and different delivery systems. The outcome of BPR initiative not only resulted in improving the processes but also in developing number of software applications which eventually increased their operational efficiency and optimized costs

  • An intelligent and comprehensive data acquisition and content management system was developed which enabled capturing data from different sources and store them in a structured format. A robust application and data architecture was developed to easily adapt to new data formats. The solution was developed using Microsoft technology involving MVC, C#.NET, SQL Server and Azure.
  • A framework was established to accomplish partial manual intervention to check for correctness before committing information into data repository. The data verification workflow ensured accuracy of information and also enabled clients to manage productivity and efficiency of remote editorial resources.
  • An intuitive composition system was developed using XML and XSL-FO to generate PDF. This replaced the legacy publishing mechanism.
  • Robotic Process Automation was implemented using web scrapping techniques to monitor online data sources for information change thus reducing manual interventions.


Key resources involvement in production activity reduced by more than


Data errors reduced by more than


Book publishing time reduced by more than


Manual interventions for data cleansing and verification reduced by more than


Scalability increased and they were able to innovate and introduce new products