We rolled out a milestone release on Thursday, November 19th. It seemed to go pretty smoothly...
Within an hour or so, we noticed some "503" errors. (We know. You noticed them, too.) "Maybe we're under an extreme load and dropping connections?" Nope. We checked our system… and while there was a heavy load, it wasn't anything we can't normally handle with ease.
First we did some research. Then we ran some tests. Then we analyzed data.
"At this point, I was pretty sure that we had characterized the problem correctly. Solving the problem is a completely different story, however. This is where the entire team really swung into action." Explains Wes Mitchell, VP of Engineering. "Some people started running stress tests on internal systems to try to reproduce the problem. Others started analyzing log files to try and figure out which requests were taking the longest. Other people looked at database queries to figure out if some were taking way longer than expected."
We have a very committed team with a bias towards action and we consistently operate under a sense of urgency. Debugging a problem like this on a heavily-loaded live system is extremely difficult and frustrating. It's easy to try to mask the problem by, for example, just putting in more hardware… But that's not how we do things around here.
Given the merchant/customer focus that is a critical part of who we are, the pressure to avoid a devastating hit for our customers was extremely high. Our team kicked it into high gear with aggressive contributions and teamwork to find, address, and permanently fix is issue.
"I can't say enough about how the team handled this challenge. Lots of people worked a lot over a period of seven days. We progressively narrowed down the problem to a single request. We went over all the changes we had introduced with a fine-tooth comb. We tested various features by turning things on and off one at a time. We reviewed the code again, and again, and again." Said Wes.
We're a lean team of less than 40 hard-working (unusually cool) people and we're grassroots passionate about our company and the merchants we serve. But because we're such a small, invested team, we communicate constantly and pull together like a family- whether we're celebrating or struggling.
Wes continued, "Here's what I'm most proud of: my folks don't give up. They take it personally. They think things through. They all worked together. There was no blaming, no egos out of place. Everybody helped. They went after something really hard, and they beat it. It doesn't get any better than that."
When facing a major technical problem, our engineering team actually became more effective as they rallied to find the solution. Not one of the engineers considered giving up or passing responsibility to someone else. They approached it as an opportunity to look at the solution in innovative ways. They showed amazing commitment and focus, and that commitment paid off when they found the solution.
Director of HR, Juanita Lott noted, "We celebrated the fix as a company… and in the dozens of congratulatory emails that followed that day, not one member of the team focused on their personal contribution… they all called out the contributions of their peers. I love this place."
We have a secret little motto in the MerchantCircle office: GSD. It stands for "Get Stuff Done." We literally have it scribbled across white boards and it's occasionally posted to our facebook statuses. We even text it to each other- for a little extra encouragement, and a reminder that we're on the same team.
Founder Ben Smith said it best, "Our team shares a common passion to grow our business; we don't have time for politics or lip service. We GSD. Respect. Teamwork. Innovation. We value merchants above all and do whatever it takes to provide them with the best products and service. I'm proud of our talented engineering team for actively demonstrating our core values as they quickly resolved this issue."
We recognize that no one is above the occasional engineering glitch. But because over 1.4 million small business owners count on MerchantCircle, our team of perfectionists knows the most important things we can do when issues arise are fix them fast and fix them right… And having great take-out delivered, doesn't hurt.