Capacity estimation for compute heavy web applications

Compute Heavy Applications

Web applications which primarily deals with doing expensive computations with minimal or no I/O calls are classified as compute heavy. For example — given a user and their shopping cart, compute the total cart value, given a user and recommended products, score using an ML model and rank them, etc.

4 physical cores handling 4 concurrent users. Here, concurrency == parallelism

Concurrency != Parallelism

Often we confuse between concurrency and parallelism. In our Ranking Service example, if we had allocated 100K cores to support 100K concurrency, we would have used a parallelism of 100K, by virtue of 100K cores, to support 100K concurrency. While this is totally acceptable, it may not be very optimal.

2 physical cores handling 4 concurrent users. . Here, concurrency != parallelism

Load Distribution

In our Ranking Service example, based on our load test, we can support a peak concurrency of 100K with 70K physical cores with a max (or 99th percentile) latency of 500 ms. What if the peak load happens only for 1 hour in a day and rest of the 23 hours, the peak never exceeds 50K? This means, for 23 hours in a day, most of our cores will be idle. This is never a good thing since we will paying for that compute unnecessarily.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store