business updates post I shared in August this year, I mentioned how we hadn’t had any large tech issues for a couple of years. Unfortunately, we have had two episodes in quick succession in the last two months, affecting between 5 and 20% of our active customers. As a broker, we have multiple external dependencies.
To name a few: The issues on Nov 6th and Dec 4th were triggered due to edge cases with our external dependencies. This is no excuse, and I understand that, as a platform, we are responsible for all the issues you face. But I wanted to share with you what went wrong and what we are doing about it.
The Nov 6th issue was due to an unscheduled update in the anti-malware monitoring service from our EMS vendor, which started throttling our servers. You can check the detailed RCA here. Yesterday’s issue seems to be because of an exponentially larger number of customer password reset requests that caused login issues.
On Monday morning, the system that notifies users of logins from new geographical locations based on IP addresses sent out an unexpectedly large number of alerts. We discovered that this was the result of an increase in the geo-location accuracy of the IP/geo-location database that we use. A routine update of this database happened over the weekend.
We believe this led to a large influx of password reset requests from confused users, putting a strain on our login systems and resulting in login failures. We will share a detailed RCA on the disclosure page soon. We now have put in place fixes to ensure these types of cases don’t affect our platform in the future.
Read more on livemint.com