Its data.. Handle with care

Shilpa Dhobale
4 min readMay 26, 2021

Garbage In, Garbage Out.

That’s what you hear when we talk about data. But what happens if the input is good, but the processing gives output as Garbage?

Let me share one recent experience -

My spouse was getting an insurance policy and (tempted to say obviously) my name was in the nominee. Basic details, address etc. he entered in the online application form. We then had a telephonic interview to check about our health status and all. Then within a few days, we get the hard copy of the policy document, a huge booklet with text in absolutely fine print.

[Not aware of regulatory guidelines, shouldn’t a soft-copy been enough? ]

The booklet i.e. the policy document included the application form as-is, the terms and conditions and the policy details.

I wondered, why are they having copies of the same data?

Could be because there are multiple systems and each handling a different aspect of the onboarding journey. Genuine or legacy system driven need to have the same data stored again. Whatever be the reason, all the data versions should be in sync and remain the same.

This expectation went for a toss when I saw instead of my actual Date of Birth, my husband’s date of birth was printed in mine. Before I could question my husband whether he had entered the data properly, he showed me — what he had given in the application form (glad, he hadn’t forgotten my birthday :) ) and what was captured in the system.

Ouch, the duplication failed !!

As we often say — our eyes see what we want to see. And we focus on what is important to us.

My husband is a banker and into retail mortgages. Assuming, whenever he looks at any document there are some basic data points he verifies and DoB is one such information. Hence he noticed that error. My eyes are trained to notice (in)consistency and apparent soundness of data. So when I saw the same DoB for him and me, I further checked

  • Is the age also the same? Yes. [Ok means this is computed by the system using DoB]
  • Is the gender the same? No [Thankfully. Means data originally entered and data re-captured, could be a manual process and DoB could be a typo or a data-entry error]

The Birthday enthusiast and age tracker of our home, my little one, was asleep else she would have been the first to spot these issues.

Just out of curiosity, I decided to go through the policy document and see what is done and how. More intriguing facts -

  • My details are included as Spouse and as Nominee.
  • When I am in the role of Spouse, we both share the same DoB (as a spouse you share more than just life)
  • When I am the nominee, I get to retain my original DoB. I deduced this because the re-captured info does not show DoB, but shows age, my actual age.

There is more, the format of DoB is inconsistent throughout -

  • In the application form, it is DD/MM/YYYY
  • In policy details, it is DD-MM-YYYY
  • In policy summary, it is DD-MMM-YYYY

While it is possible that two individuals can have the same DoB, however too much similarity should be checked too as their data is being entered in the same form. It might be a good practice to have a rule which highlights if the same data is provided, where generally data is expected to be different and the data point is critical. This would help in alerting the person who is responsible for customer onboarding.

I have had many first-hand experiences where the basic data quality checks (according to me) have been missing. While there is always a trade-off between performance, TAT and validations, however for critical items, quality cannot be compromised.

In the products and projects where I have worked, we have at times taken a step further to ensure clean data at the source. We had to overcome the “perception” that we are collecting additional data and doing too many validations. But we stood the test of time.

To quote one example, for our GST solution, we defined a list of data points for an invoice /transaction which our clients had to send to our platform. We validate and then prepare the data as per GST system requirements. For every transaction, we ask the nature of the customer /counterparty and the value expected is registered or unregistered. If registered, GSTIN becomes mandatory.

Now, this may appear unnecessary or why collect this info when it can be derived by checking if GSTIN is provided.

Still, we went ahead with our approach. The nature of the customer, by itself, was not the critical item, but the ensuring that at the source (ERP/accounting systems) the data has been properly recorded. Way back in 2017, when GST was newly implemented, obtaining registrations were in process, some customers had more than one registrations, and so on. From reporting front, transactions with registered are declared invoice wise, while with unregistered ones as aggregate.

This approach helped our clients to ensure they are correctly identifying their parties and getting their GSTINs and no cross reporting happens. That’s what we had aimed for — a mechanism to ensure accurate data at the source.

To cut the long story short, data quality is of paramount importance. No second thoughts.

While defining data needs and how data will be collected, exchanged and used, have the checks which will result in clean data being captured and used at all stages.

--

--

Shilpa Dhobale

Thinker, Reader, Blogger | Budding poet and artist | An emotional and enthusiastic being | Mother of a lovely daughter, who keeps me on my toes all the time.