In today’s world of software engineering, traditional relational databases (RDBMS) like MySQL or PostgreSQL databases are no longer the ‘de facto’ choice for a database system. Since the increase in popularity of cloud computing, NoSQL databases have risen to play an important part in data architecture in the cloud. Cloud servers no longer guaranteed dedicated performance (disk io / cpu / memory) as well as a 99.99% uptime. The best approach of designing software in the cloud is by designing the application to expect failures. This proves to be an issue as databases are usually the living heart and soul of any data-driven application. Without data, the application cripples.

Because Cloud Computing provides the ability to scale server resources up/down easily and quickly, the database design will also need to respond to the change in traffic load and scale accordingly. Most databases will have replication features, which you can setup a master-slave network of databases to help ease the load of a high-read application. But what about a high-write application – would you need to consider sharding?

The key point to take away from this post is that although there are many databases available that you can choose from (MySQL, PostgreSQL, Redis, Riak, MongoDB, CouchDB, HBase, Cassandra, Neo4j, etc.), there is no such thing as the ‘best database for the cloud’. In order for you to choose the ‘best’ database, you must first identify the needs of your application. If you are familiar with the CAP Theorem (Consistency, Availability, Partition Tolerant), different databases are designed for different combinations of CAP. Although there are many blog posts on the interweb comparing the different variations of databases, you should not base your choice solely from these results. For example, although big corporations like Twitter and Facebook use HBase for some of their products, doesn’t mean that you should implement HBase on your design. Perhaps a less complex setup is key and therefore a database like CouchDB is more suitable. So below are a few of the key questions you should try and answer which should help you narrow down the choices so that you can then focus on the details of the databases and then ultimately choose the ‘best’ database for your application.

Is your application read or write heavy? or both?

  • Read-Heavy – [DBs with Replication feature i.e. almost all] most databases provide master-slave (or even master-master) replication. Replication will have handle the load of read-heavy applications.
  • Write-Heavy – [DBs with Sharding feature i.e. MongoDB, HBase, Cassandra, Riak] whilst you can have a master-slave setup, all writes (i.e. inserts / updates / deletes) will be directed to the master. In order to take some of the load off the master, you need sharding
  • Both – [DBs with Replication and Sharding feature i.e. MongoDB, HBase, Cassandra, Riak]

Do you have an ops team to help with the complex setup / management of db clusters?

  • Databases like MySQL, CouchDB are very easy to get started with on a single server. They provide easy to use GUI / admin tools that you can experiment around with. Others like HBase, Cassandra and MongoDB will require more planning and architecture design to get an optimized setup.

Does your data need guaranteed durability?

  • Databases like MongoDB and Redis are known for their blazing speed because they first store values onto memory which then gets flushed to disk periodically. However as a trade-off to speed, they are a threat for data not being persisted given a DB failure event.

How big is your data?

  • Databases like Cassandra and HBase are designed for ‘Big Data’ ground up. However the ability to handle huge data comes at a cost: complexity

What is your primary goal and what does your application dataset resemble?

  • Are you building a write-log type system, or a read-cache reference type system, or a write-analyse analytics type system? Does your application natural fall under a key-value (Redis, Riak) / document orientated (MongoDB, CouchDB) / relational (MySQL, PostgreSQL) / columnar (Cassandra / HBase) or graph (Neo4J) type data model?

Do you need features like map-reduce / secondary indices / REST interface / views or stored procedures?

  • Some databases provide a subset of features where others don’t. For example, if you need a feature like secondary indices, you would choose MongoDB over CouchDB.

There are many other questions you should ask yourself but the bottom line is: there is no ‘right or wrong’ database. Instead there are ‘suitable and less suitable’ ones. In fact, you don’t even have to choose just one – a combination of multiple databases could as well, yield the best result.



Ok, I will admit it. I had recently fallen in love with Javascript. So much so, that I went back and refactored a whole bunch of code and made it all pretty, and easy for others to read, learn and use.

As part of my love for javascript, today I would like to share some of the things I have done – with hope that in the future, it might be useful for you.

1) Encoder.js – A simple webcam encoder / browser-based stream publisher

Encoder.js is a simple JS API that configures a simple lightweight AS3 Flash application and streams your webcam to a streaming server of your choice. Below’s an example of how easy it is to get started with just a few lines of code:

<script type="text/javascript" src=""></script>
<script type="text/javascript">
			// The div to insert the webcam preview
			container: '#holder',			
			// Setting the preview width
			previewWidth: 800,			
			// Setting the preview height
			previewHeight: 480,					
			// Setting the Encoder configuration
			config: {							
				streamServer: "rtmp://",
				streamName: "example",
				bitrate: 800,
				videoWidth: 640,
				videoHeight: 360,
				fps: 15,
				keyFrameInterval: 30
			// Setting up event listeners
			events: {
				onReady: function() {			
					// Auto-start

There is still room for improvement (like a full documentation of the JS API) but you can start and have a play with it by adding

<script type="text/javascript" src=""></script>

to your html file.

2) Log.js – Simple Javascript Logger / Debugger

Log.js is a simple lightweight ‘library’ that I find very useful, and that should be included in any javascript enabled web-based project. This is not as full-featured the Log4X libraries i.e. Log4JS but it does the job. Note that this logger relies on the ‘console’ object being present – hence it works on modern browsers like Chrome, Safari as well as plugins like Firebug for Firefox.

// How to configure the logging level
Log.configure("info"); // Logging Levels [VERBOSE, DEBUG, INFO, WARN, ERROR]

// How to log to console
Log.verbose("tag", "This is the verbose log"); // This won't be displayed
Log.debug("tag", "This is the debug log"); // This won't be displayed"tag", "This is the info log"); // Prints "This is the info log"
Log.warn("tag", "This is the warn log"); // Prints "This is the warn log"
Log.error("tag", "This is the error log"); // Prints "This is the error log"

// How to filter the logs for a specific TAG

Download the source code of the library here. (Only 120 lines non-minified)

3) Data.js – Data / Storage Helper Library

Data.js is another simple and lightweight ‘library’ that could be useful if you would like to store key-JSON (an extension of plain text key-value) pairs either temporarily or permanently for your web-application. It provides two different types of data store: ‘TempStore’ and ‘PermStore’. The TempStore stores your JSON data (with your key of choice) temporarily – i.e. if the user refreshes the page, the data is lost. This is useful to store temporary information from AJAX calls to be referenced later. The PermStore stores your JSON data ”permanently – using HTML5′s localstorage. Data is only lost if the user ‘resets’ their browser or clear any browser data and cache.

Another neat thing about this library is that it provides a ‘expiry’ feature, where by if you would like, you could expire the stored data after X period of time.

Let’s have a look at how to use it:

// Store a response from an ajax call (using jQuery)
	url: "webservice.php",
	dataType: "json",
	success: function(data) {

		// Store the response JSON object Temporarily
		Data.TempStore.set("response", data);

// Store your own JSON data object

var data = {
	"name" : "mun",
	"age" : 26,
	"location" : "london"

// Store the data in the PermStore 'forever'
Data.PermStore.set("myself", data, Data.EXPIRES_NEVER);

// Read the data from the PermStore
Data.PermStore.get("myself");		// prints {"name" : "mun","age" : 26,"location" : "london"}

// Store the data in the PermStore until 30 mins later (i.e. 1800000 millis)
Data.PermStore.set("myself", data, Data.EXPIRES_(1800000));

// Read the data from the TempStore
Data.TempStore.get("myself");		// prints undefined - since 'myself is stored in the PermStore instead of the TempStore

Download the source code of the library here.

So hopefully you will find these simple libraries useful to you as well. Enjoy and as always, feel free to leave a comment of what you think about these libraries.




If you are a developer dealing with iframes and cookies, then you would probably know this, but if not, then here is something to think about:

If website A ( is hosting an iframe whose url is website b (, and website B attempts to write a cookie – either via HTTP headers or javascript – it will work on Chrome and Firefox but not Safari (and IE actually).

*Note : The scenario described above is only true for the default settings of each browser. Unlike Chrome and Firefox, Safari by default sets its privacy setting to ‘Block cookies from third parties and advertisers. Obviously setting your privacy settings to ‘always accept cookies’ will mean that you won’t encounter the problem described above.

There are solutions / hacks that bypasses Safari’s privacy settings (like posting the iframe to itself) but sooner or later these loop holes will be fixed and closed by Apple. Perhaps this doesn’t affect you, since your application doesn’t rely on cookies (say for example a static html or image) however for the rest of us – this behaviour is bad news.

But what about HTML5′s localstorage? Why don’t we use that instead? Well, if the use of cookies is controlled by your application – i.e. you need to store “remember_me=true” locally on the client browser – then localstorage would be a good place to do this. However, that doesn’t solve all the issues as there are other types of cookies that are transparent to your application. Let’s look at 2 examples.

1) Session cookies – If you’re using a dynamic programming language that supports sessions (like jsp / php / asp / etc.) then the server will most likely insert a session cookie seamlessly on your behalf (i.e. a JSESSIONID or PHPSESSIONID).

2) Load Balancer cookies – Most load balancers (physical or cloud based) use cookies to track and manage a persistent connection. The load balacer will seamlessly add an extra cookie to the HTTP response (i.e. X-Mapping-abcdefg) which is used to determine which server to route all future requests to.

Putting the above into context, if your session-dependent web application is spread across multiple servers via a load balancer and is situated within an iframe in another domain, then any new Safari user visiting that site for the first time will have:

a) a different session id each time – since the session id cookie was blocked by Safari’s default privacy setting

b) a different server serving its requests each time – since the load balancer’s x-mapping cookie was blocked by Safari’s default privacy setting

This will then lead to unexpected behaviours like being logged out repeatedly or session data inconsistency (i.e. data across different servers is different). So how do we go about fixing this issue? Well, there are some work arounds available on the interweb. However, a workaround is only a short term solution and will work only until All Mighty Apple decides to shut it down. Therefore the best solution to this is to know the limitations of Safari (and its iOS webkit browser cousins) and then architect your applications to expect the behaviour where cookies can not be stored. For example, use localStorage where applicable, or don’t depend on the use of cookies, or don’t use iframes, or even better – inform users Safari’s not supported. :)

Let us know what you think of Safari and its default privacy settings.



A couple of weeks ago, our hosting provider Rackspace kindly organised a race day event for the lucky few at Orca. Check out some of the pictures and video below. If you want to organise some office fun which involves cars, I would recommend this. Pictures and videos were taken at the Northampton Raceway.



Preface – This is the first post of our new engineering blog so drop us a comment and tell us how we’re doing and what you think.

Let’s begin.

Here at Orca Digital, we love tech. From software, to gadgets and designs – we love it all. Technology moves fast in this day and age but the sad news is: loving and keeping up with new tech is not always as easy as it sounds. Often it takes a lot of time, energy, (and money) to acquire those new features that your friends (or competitors – same thing really) already have. This is especially true for when you have to migrate your existing infrastructure to the cloud. Although cloud computing technology has proven to be a real success story for some companies, it also has some drawbacks. This is the story of Orca Digital’s journey to the Cloud.

Part 1: Understanding the way the Cloud moves

We needed to understand the ins’ and outs’ of the Cloud before we embarked on our journey. Knowing the benefits of a cloud IaaS (Infrastructure as a Service) is one thing, but knowing how the Cloud differs from a traditional hosted infrastructure will help us decide whether it makes for a suitable solution. Understanding the differences will also provide us with the ability to tune the Cloud to fit our specific needs (and also to squeeze as much out of it as possible). For example having a physical server with a quad core processor will not deliver the same amount of resource or processing behaviour as a virtual machine which has been allocated quad core’s worth of processing power. Even though there are many Cloud providers out there, each Cloud provider’s infrastructure will slightly differ from one another. After some basic research, we decided to concentrate on 2 specific providers: Amazon AWS’s EC2 and Rackspace Cloud.

Part 2: Choosing the type of Cloud (providers)

Comparing Cloud providers can be quite challenging. There are many cloud providers out there that you can choose from – Amazon AWS, Rackspace Cloud, Google AppEngine, Heroku, and many more. We had decided to narrow our choices down to Amazon AWS and Rackspace Cloud because: (A) Amazon’s popularity, recommendations as well as leadership in the industry, (B) we were already existing Rackspace customers. On top of that, they both had competitive pricing and had data centres in the EU as well as the US (which is key since we are based in the UK). Even though Amazon has the advantage of being around for longer and having quite a substantial amount of big and small sized clients, Rackspace had a managed service level, which for a relatively small monthly price, provides support for your virtual machines – should anything go wrong at any time.

If we were to judge the two IaaS providers by just reading tech docs or opinions on forums and blogs, it would have been a very difficult choice to make. Therefore, we thought that the best way to compare the two was to put them both to the test. We signed up with accounts for both, and began to install a few of our web-apps onto them. During this process, we made sure we took note of any noticeable differences in the two solutions – from timings to complete certain tasks, to the discovery of implementation hurdles, and even frustrations and/or appraisal from our migration engineers. After a couple days of hacking around, we had the answer (well at least a pretty good idea of the two).

The research showed that Amazon had the upper hand. AWS’s EC2 infrastructure provided many more features and functionalities that were not available on the Rackspace Cloud infrastructure at the time. Features like Elastic IP addresses and flexible billing were missing from the Rackspace Cloud infrastructure – though they promised it will be on the roadmap. That is why, we had decided to go with Rackspace. (“wait, what? huh???” Now now, calm down – there are reasons for this) To begin with – we are an existing client of Rackspace – i.e. our dedicated hosted solution. Since the Rackers were already familiar with our infrastructure and solution, their input and advice on the migration plans were very valuable to us. With their help and expertise, we were able to come up with a solution that was tailor made for our platform, suited for the Cloud. Another major factor was support. Whilst Amazon EC2 is all about self-serve and APIs and tech docs and what not, Rackspace has a support level (what they call a managed support service). Since the engineering mob @ Orca Digital is currently made up of a team of 8, being able to rely on a support team 24/7 to help manage your servers is great. This meant we could worry less about infrastructure, and concentrate on making good software. The last and most important reason for choosing Rackspace was that we had dedicated servers that couldn’t be moved to the cloud. These applications rely on low latency internal network communications, which meant that the cloud servers running the web-apps will need to be on the same network as the dedicated physical machines (or physically close to). Rackspace were able to provide us with a solution that ‘connected’ our dedicated machines (on our internal private network) with the cloud applications (on the Rackspace Cloud internal private network) via RackConnect – which meant that we can achieve a hybrid solution, which provides scalability whilst meeting the needs of our low latency internal network dedicated machines / applications.

Part 3: Taking the first leap

As you can see, comparing the cloud providers needed more than just a simple checklist comparison. You need to understand what you need from your applications / platform and what you can get out of each Cloud provider. Only then will you be able to choose the right provider for you. Now that we have established which provider is suitable for us, it is time to begin the migration process. But before we got ahead of ourselves, there is one final test to perform before taking the plunge.

Migrate your test environment (or build one in the cloud if you don’t have one)

The best way to ensure that your migration will go smoothly is to do a dry run. Yes – your devs and engineers are going to moan when certain things are not working or running like they used to be, but what’s worse? – that, or your entire production platform going haywire and your boss / investors (or even worse, your end users and customers) shouting at you? We surely didn’t want the latter and so we decided to migrate our test environment over first and run it for 1 month. Our test environment was configured to be an exact replicate of our production services – with the exception of a test / sandboxed database. That way, we had 1 month to iron out all the creases that we found with our platform running on the Cloud.

Part 4: Cloudify

After our successful test run, we set a date in the calendar, co-ordinated with our Rackers, and we went for it. I will not go through step-by-step the entire migration process as it is too long and boring, but to summarise, the entire migration period took 2 weeks – which considering the amount of things we had to migrate and test, was quite an achievement. We are now running the latest versions of everything (linux, apache, php, java, mysql, clusters, etc…) and enjoying the benefits of running a hybrid platform on the Cloud and on our dedicated hosted servers.