I’ve been at Pinterest since around December 2014 and have been loving it on the web platform team. A few months ago (June-ish?) we had our first reorg ever and I found myself, for the first time, on a new team. Our team had had some minor churn but nothing quite as large as this. Focusing on performance has been a pretty interesting change of pace.
The first order of business was to basically solve the problem of how do we make sure our site doesn’t get slower before developers release code into production. Turns out the standard operating procedure had been 1) see that production numbers regressed from extremely variance-riddled statistics, 2) try to identify the commit that caused it (number of suspicious deploys could be from 1-6 full of 10-20 commits each due to the variance of the metrics), 3) do a tedious bisect or just plain guess which commit it could have been, 4) try to find an owner and convince them to fix or revert the change.
So instead of detecting regressions 1-14 days after they happen, why not try to detect them as soon as code went in? So the system goes like this:
We deploy roughly twice a day, so unless a developer puts in a bad change right before a deploy, we can (and have) detect well before code ever goes out.
I really need to come up with a better name than "The Performance Regression Framework." Anyway, it uses AWS EC2 instances to serve the production build candidate of the site, and uses BuildKite to spawn GhostJS remote-controlled Linux-based Firefox browser processes to hit the site with. This combination of high-end device and high-end network connection on the test-runner side doesn’t make for a good approximation for our p90 users, but I found that as long as we run enough tests such that the variance isn’t too bad, the tests do remain meaningful.
A future feature will be to throttle the network, which we should be able to do with an interface to Chrome and its Devtools. This will help us detect regressions that cause the app to be network-bound. I’m also exploring using non-headless Puppeteer, which should give us easier access to the Chrome API.
Anyway, nothing about the framework is really coupled with the tools I chose. They’re really just separate components that work together to achieve the goal quickly with what we had and the limitations of our current app platform:
I’m happy to say that after a few months of this framework in place, we’ve detected and caught a number of regressions. It’s tough at first to trust the numbers but after awhile it proves itself out. I’m glossing over some of the bugs that we found, as well as the whole instrumentation framework that even spits out these numbers. I’m also glossing over this other tool called the investigation framework (we should really come up with a better name) that my team built that allows us to automatically do a git bisect when a regression does happen and basically build every single commit inside of an offending build (could be between 10-20 commits!) and run the whole test suite against each one until it finds the commit that caused it.
Yeah, that tool is actually awesome. It takes the machines a few hours to narrow it down sometimes. Imagine a human doing that work!
Anyway, maybe I’ll write more about the instrumentation framework. I’m currently rewriting it, which I also think is interesting. Yay more things to write about.
COPYRIGHT © 2019 BY JESSICA CHAN