Designing Infrastructure for SlackerNews Launch

Background #

On Monday, I’m officially launching, the best of HackerNews, powered by artificial intelligence.

In this post I describe the steps I took to ensure the burst of activity on Monday morning would be handled gracefully.

Goal #

Support up to 5,000 users clicking once every 10 seconds, with a median load time below 1 second.

5,000*6 = 30,000 pages/min -> 500 pages/sec.

Median Load Time:
<1 second

Tools #

Test Details #

Test Actions #

Simply load the front page
2016-07-30 12_29_16-Settings.png

Settings #

Constant load
2016-07-30 12_29_40-Settings.png

Original Hardware #

Specs #

Virtual Machine on Dedicated Hardware
CPU: Intel Xeon E5606 @2.13GHz
OS: Windows Server 2012R2 Standard

Overall Results #

Looks like we need a new strategy for the minimized bundles included in MVC 4 by default
2016-07-30 12_31_00-Settings.png

Avg page load times #

Saturated, climbed from 7 seconds up to 35 seconds per response
2016-07-30 12_20_51-Cortana.png

Throughput #

Way below target (500pages/sec): 20 requests/sec
2016-07-30 12_23_30-Settings.png

New Hardware #

Specs #

AWS EC2 c4.large Instance:
CPU: 2vCPU Intel Xeon E5-2666 v3 (Haswell)
RAM: 3.75 GB
OS: Windows Server 2012R2

Overall Results #

2016-07-30 13_46_14-Photos.png

Avg page load times #

205ms is amazing, this should load very well even on mobile
2016-07-30 13_46_59-Photos.png

Throughput #

Excellent throughput, looks like reducing number of network requests by combining or offloading static contents will enable even higher page throughput
2016-07-30 13_47_23-Photos.png

Server Stats #

Very low CPU usage, CPU looks over-spec
2016-07-30 13_45_34-Photos.png

Summary #

Although it’s always good to have a healthy margin for PR events, we would probably be better served reducing the size of the instance, and consolidating or offloading static content (css, javascript, images, fonts).


Now read this

Limitations of SQL Server histogram based row count estimates

Recently, I came across a situation where a query with n-joins was occasionally slow. Here’s how I tracked down the problem… Profile Overall Application # First, profile the application with Glimpse. This gives a baseline for expected... Continue →