pinterest

Less than a year ago, I wrote a post on the different DB solutions available that are ‘cloud-ready’. For those who are still struggling with the decision of which database solution to use and want some real-life examples, I recommend reading this recent post on Pinterest’s infrastructure architecture.

Key notes from the article:

Choice of DB: MySQL + Redis over MongoDB or Cassandra
Reason for DB: Sharding over Clustering – because of maturity and ease of hire
Number of DBs: 88 Master + 88 Slaves (cc2.8xlarge = $2.700 per Hour = $1944 per Month per Server = $342,144 per month for just MySQL)

Interesting points:

  • Algorithm for placing data is very simple. The main reason. Has a SPOF, but it’s half a page of code rather than a very complicated Cluster Manager. After the first day it will work or won’t work.
  • Can’t perform most joins.
  • Lost all transaction capabilities. A write to one database may fail when a write to another succeeds.
  • Reports require running queries on all shards and then perform all the aggregation yourself.
  • Joins are performed in the application layer.
  • When the Pin table went to a billion rows the indexes ran out of memory and they were swapping to disk.
  • If your project will have a few TBs of data then you should shard as soon as possible.
  • Architecture is doing the right thing when growth can be handled by adding more of the same stuff. You want to be able to scale by throwing money at the problem by throwing more boxes at the problem as you need them. If you are architecture can do that, then you’re golden.

My thoughts:

Firstly, thank you Pinterest for showing us your infrastructure architecture. With so many choices of DBs available (NoSQL vs SQL vs NewSQL), it is very insightful and helpful for others in a position to choose which DB strategy path to follow. After analysing the post, I feel that for a small team / startup and for future proofing, a fully managed DB is a better option. Small teams don’t usually have the necessary man power to maintain and branch out new MySQL shards. What they should do instead, is to concentrate on delivering quality software and if traffic increases, to have the flexibility to push a button and let a DaaS take care of the scaling ala DynamoDB or Instaclustr or MongoHQ.

Their decision to shard MySQL meant that:

a) they can horizontally scale
b) they can run flexible (but limited) queries

however, sharding MySQL had drawbacks too:

a) you can no longer have JOINS – one of the main strong points of having SQL
b) no longer fully transactional – again, another strong point for choosing SQL over NoSQL
c) operational burden for manual sharding – your team needs to be 100% on top of data size as well as when the need to re-shard

It does feel to me that there is quite a lot to work with (though they have the man power to do so 44 engineers). But then again, they’ve done it and proved it to work – so surely it’s can be considered a winning strategy?

Thoughts?

@munwaikong

Infrastructure

Preface – This is the first post of our new engineering blog so drop us a comment and tell us how we’re doing and what you think.

Let’s begin.

Here at Orca Digital, we love tech. From software, to gadgets and designs – we love it all. Technology moves fast in this day and age but the sad news is: loving and keeping up with new tech is not always as easy as it sounds. Often it takes a lot of time, energy, (and money) to acquire those new features that your friends (or competitors – same thing really) already have. This is especially true for when you have to migrate your existing infrastructure to the cloud. Although cloud computing technology has proven to be a real success story for some companies, it also has some drawbacks. This is the story of Orca Digital’s journey to the Cloud.

Part 1: Understanding the way the Cloud moves

We needed to understand the ins’ and outs’ of the Cloud before we embarked on our journey. Knowing the benefits of a cloud IaaS (Infrastructure as a Service) is one thing, but knowing how the Cloud differs from a traditional hosted infrastructure will help us decide whether it makes for a suitable solution. Understanding the differences will also provide us with the ability to tune the Cloud to fit our specific needs (and also to squeeze as much out of it as possible). For example having a physical server with a quad core processor will not deliver the same amount of resource or processing behaviour as a virtual machine which has been allocated quad core’s worth of processing power. Even though there are many Cloud providers out there, each Cloud provider’s infrastructure will slightly differ from one another. After some basic research, we decided to concentrate on 2 specific providers: Amazon AWS’s EC2 and Rackspace Cloud.

Part 2: Choosing the type of Cloud (providers)

Comparing Cloud providers can be quite challenging. There are many cloud providers out there that you can choose from – Amazon AWS, Rackspace Cloud, Google AppEngine, Heroku, and many more. We had decided to narrow our choices down to Amazon AWS and Rackspace Cloud because: (A) Amazon’s popularity, recommendations as well as leadership in the industry, (B) we were already existing Rackspace customers. On top of that, they both had competitive pricing and had data centres in the EU as well as the US (which is key since we are based in the UK). Even though Amazon has the advantage of being around for longer and having quite a substantial amount of big and small sized clients, Rackspace had a managed service level, which for a relatively small monthly price, provides support for your virtual machines – should anything go wrong at any time.

If we were to judge the two IaaS providers by just reading tech docs or opinions on forums and blogs, it would have been a very difficult choice to make. Therefore, we thought that the best way to compare the two was to put them both to the test. We signed up with accounts for both, and began to install a few of our web-apps onto them. During this process, we made sure we took note of any noticeable differences in the two solutions – from timings to complete certain tasks, to the discovery of implementation hurdles, and even frustrations and/or appraisal from our migration engineers. After a couple days of hacking around, we had the answer (well at least a pretty good idea of the two).

The research showed that Amazon had the upper hand. AWS’s EC2 infrastructure provided many more features and functionalities that were not available on the Rackspace Cloud infrastructure at the time. Features like Elastic IP addresses and flexible billing were missing from the Rackspace Cloud infrastructure – though they promised it will be on the roadmap. That is why, we had decided to go with Rackspace. (“wait, what? huh???” Now now, calm down – there are reasons for this) To begin with – we are an existing client of Rackspace – i.e. our dedicated hosted solution. Since the Rackers were already familiar with our infrastructure and solution, their input and advice on the migration plans were very valuable to us. With their help and expertise, we were able to come up with a solution that was tailor made for our platform, suited for the Cloud. Another major factor was support. Whilst Amazon EC2 is all about self-serve and APIs and tech docs and what not, Rackspace has a support level (what they call a managed support service). Since the engineering mob @ Orca Digital is currently made up of a team of 8, being able to rely on a support team 24/7 to help manage your servers is great. This meant we could worry less about infrastructure, and concentrate on making good software. The last and most important reason for choosing Rackspace was that we had dedicated servers that couldn’t be moved to the cloud. These applications rely on low latency internal network communications, which meant that the cloud servers running the web-apps will need to be on the same network as the dedicated physical machines (or physically close to). Rackspace were able to provide us with a solution that ‘connected’ our dedicated machines (on our internal private network) with the cloud applications (on the Rackspace Cloud internal private network) via RackConnect – which meant that we can achieve a hybrid solution, which provides scalability whilst meeting the needs of our low latency internal network dedicated machines / applications.

Part 3: Taking the first leap

As you can see, comparing the cloud providers needed more than just a simple checklist comparison. You need to understand what you need from your applications / platform and what you can get out of each Cloud provider. Only then will you be able to choose the right provider for you. Now that we have established which provider is suitable for us, it is time to begin the migration process. But before we got ahead of ourselves, there is one final test to perform before taking the plunge.

Migrate your test environment (or build one in the cloud if you don’t have one)

The best way to ensure that your migration will go smoothly is to do a dry run. Yes – your devs and engineers are going to moan when certain things are not working or running like they used to be, but what’s worse? – that, or your entire production platform going haywire and your boss / investors (or even worse, your end users and customers) shouting at you? We surely didn’t want the latter and so we decided to migrate our test environment over first and run it for 1 month. Our test environment was configured to be an exact replicate of our production services – with the exception of a test / sandboxed database. That way, we had 1 month to iron out all the creases that we found with our platform running on the Cloud.

Part 4: Cloudify

After our successful test run, we set a date in the calendar, co-ordinated with our Rackers, and we went for it. I will not go through step-by-step the entire migration process as it is too long and boring, but to summarise, the entire migration period took 2 weeks – which considering the amount of things we had to migrate and test, was quite an achievement. We are now running the latest versions of everything (linux, apache, php, java, mysql, clusters, etc…) and enjoying the benefits of running a hybrid platform on the Cloud and on our dedicated hosted servers.

@munwaikong

Infrastructure