pinterest

Less than a year ago, I wrote a post on the different DB solutions available that are ‘cloud-ready’. For those who are still struggling with the decision of which database solution to use and want some real-life examples, I recommend reading this recent post on Pinterest’s infrastructure architecture.

Key notes from the article:

Choice of DB: MySQL + Redis over MongoDB or Cassandra
Reason for DB: Sharding over Clustering – because of maturity and ease of hire
Number of DBs: 88 Master + 88 Slaves (cc2.8xlarge = $2.700 per Hour = $1944 per Month per Server = $342,144 per month for just MySQL)

Interesting points:

  • Algorithm for placing data is very simple. The main reason. Has a SPOF, but it’s half a page of code rather than a very complicated Cluster Manager. After the first day it will work or won’t work.
  • Can’t perform most joins.
  • Lost all transaction capabilities. A write to one database may fail when a write to another succeeds.
  • Reports require running queries on all shards and then perform all the aggregation yourself.
  • Joins are performed in the application layer.
  • When the Pin table went to a billion rows the indexes ran out of memory and they were swapping to disk.
  • If your project will have a few TBs of data then you should shard as soon as possible.
  • Architecture is doing the right thing when growth can be handled by adding more of the same stuff. You want to be able to scale by throwing money at the problem by throwing more boxes at the problem as you need them. If you are architecture can do that, then you’re golden.

My thoughts:

Firstly, thank you Pinterest for showing us your infrastructure architecture. With so many choices of DBs available (NoSQL vs SQL vs NewSQL), it is very insightful and helpful for others in a position to choose which DB strategy path to follow. After analysing the post, I feel that for a small team / startup and for future proofing, a fully managed DB is a better option. Small teams don’t usually have the necessary man power to maintain and branch out new MySQL shards. What they should do instead, is to concentrate on delivering quality software and if traffic increases, to have the flexibility to push a button and let a DaaS take care of the scaling ala DynamoDB or Instaclustr or MongoHQ.

Their decision to shard MySQL meant that:

a) they can horizontally scale
b) they can run flexible (but limited) queries

however, sharding MySQL had drawbacks too:

a) you can no longer have JOINS – one of the main strong points of having SQL
b) no longer fully transactional – again, another strong point for choosing SQL over NoSQL
c) operational burden for manual sharding – your team needs to be 100% on top of data size as well as when the need to re-shard

It does feel to me that there is quite a lot to work with (though they have the man power to do so 44 engineers). But then again, they’ve done it and proved it to work – so surely it’s can be considered a winning strategy?

Thoughts?

@munwaikong

Infrastructure

ShowCaster’s success is down to its ability to scale up and down and meet the varying demands of live events – sometimes during short periods of time (typically around an hour) we will receive bursts of hundreds of thousands of users interacting with our platform concurrently (eg. when promoted on a mainstream TV channel).

We’ve just completed a project rebuilding our auto-scaling infrastructure – consisting of 2 weekends creating ShowCaster Tools. ShowCaster Tools keep track of the health and usage of our servers as well as demand-based auto-scaling, wrapped in a simple GUI .

Let’s get technical

In order to automate this process, we needed to integrate our system with Rackspace Cloud APIs. The brief was to create a simple web interface for anyone in the team to use (both techies and non-techies). Seeing as we (the techies) are always up for challenges and exploring the world of languages & tech, we thought that this would be a great opportunity for us to try a different tech stack from our usual Java.

Ruby as a replacement for our perl scripts

We currently write all our scripts using Perl. So far it has done the work without issues. However, everyone who has used Perl knows that it’s not the easiest to learn. Every time a new developer joins our team, it takes a while until they fully understand the language and how to make the most out of it.

Ruby on Rails as the MVC framework for the front-end

At the same time we were learning Ruby for the server side interactions, we thought that the best tool to create the web GUI was Ruby on Rails. This is one of the most famous MVC frameworks out there for web development. We always have read that the the different helpers offered by Rails together with Ruby increases the productivity of any web development so we decided it was the right time to give it a try.

The architecture

One of the first things we realized when we started to architect the application was that there is no direct support for multithreading within Ruby on Rails. There are some “hacks” but none of them were appealing to us. For this reason, we ended up splitting our application into two main components. The first component was entirely implemented with Ruby and is in charge of all the business logic of the application. Things like:

  • Multithreading
  • Rackspace APIs interactions
  • Server monitoring and management

The other component was built with Ruby on Rails and looks after the user interface. This provides two main benefits:

  • Split the logic from the view (MVC)
  • If we decide that Ruby on Rails is not for us, we can easily bin it and start from scratch with another framework

The conclusion: Ruby yes, Rails not so much

After creating ShowCaster tools, Ruby has impressed us – especially for its SPEED. Other reasons:

a) The number of lines of Ruby code is significantly less – meaning a faster development cycle, less bugs and easier maintenance resulting in increased productivity.

b) We liked the simplicity of Ruby when integrating with other libraries – thanks to their valuable gems. We installed the Rackspace Cloud gems for their load balancers and servers and within 5 minutes we were ready to make requests to their APIs.

Ruby on Rails provided a good implementation of the MVC pattern, awesome helpers to speed up the entire development, and we easily understood how using it would improve long term productivity.

Working with the Rails ActiveRecord helper was a great experience compared to similar frameworks – we’re very familiar with Hibernate for Java and ActiveRecord was far simpler, better and faster in terms of data definition and usage than Hibernate.

In order to take full advantage of Rails, you really need to make complete use of all their helpers and tools – but in most of our user interfaces these helpers and tools wouldn’t be applicable, losing some of the potential of Rails.

Based on this experience, we’ll use Ruby to replace all our Perl scripts over time, providing us with benefits in the long term.  Rails won’t replace the web development tools used on our core products but it will definitely be considered on future projects where the potential of Rails can be used to its maximum.

@isaacmartinj

Development Infrastructure ShowCaster