Managed Auto-Failover with Load Balancing

Hard truth: no single-server is immune to down-time. Sooner or later the brunt of a hardware, network, or service failure event will consume you to your very core. In the frenzy that is to contract and work with several data centers in the US and EU, we can sympathize and further confirm that the threat of down events is imminent and real. So real, it incurs real dollar losses that far often exceed the cost of hosting severalfold. If you're a business owner, you very well know how much it costs you to be offline. And if you're a system administrator, you need more than two hands to count the man-hours it took you to recover a down system.

Down time costs


Sympathy aside, you are here because you need a sound solution yesterday. You have come to the right place! Here is the solution you've been needing all along -- fully managed -- no fluff and no puff! So what is it? Failover? Load balancing? Replication? Redundancy? Shared nothing? Below are the elements that make up our minutious cross-datacenter auto-failover and globally load-balanced solution.

Global Anycast Software-Defined Load Balancing

This is a global anycast software-defined load balancer with a single-IP front-ending your Website. It is NOT physically tied to one device or one instance, which is often prone to failure. It survives network and physical failure of the load balancer. This is how Google does it. In fact, we leverage Google's Global Anycast Cloud configuration to bring you a robust, feature-rich, and resilient network. The load balancer determines the location of the visitor and picks the closest available and healthy backend to handle the request. Our implementation comes with both auto-failover and load balancing across two remotely-independent regions: east and west US coast.

The distance between the backend-regions is over 2000 miles so a major adverse event or an act of God will not affect the health of your Website. A loss of power, network connectivity, or hardware components will NOT result in your Website being down. Because the load balancer is front-facing it mitigates many attacks like SYN flooding, port exhaustion, IP fragment floods, etc.

Database Replica with Auto-Failover

Every single point of failure needs to be addressed. The database being a critical one for dynamic websites, we ensure replication and unattended auto-failover are standard. Our pattern of building upon the best tools and platforms continues: we leverage AWS RDS and make MariaDB/MySQL available to you. Replication and auto-failover are fully managed by us for you. There's no need for you to keep track of replication logs, monitoring, or recovery. This is all seamless for all you care.

All you need to concern yourself with is the DB connection and credentials, which remain the same in good and bad times. We have opted for enhanced availability and durability. It's synchronous physical replication across multiple availability zones (AZ). In the event that a zone fails due to hardware, software, or network malfunction, your Website continues to function.

Independently-Synchronized Data Store

A shared-nothing architecture is key to scalability and resilience. A shared volume, network filesystem, or shared directory is often recipe for disaster as it's been proven over and over again. And so we share nothing between the backends. The two Web servers see a local up-to-date file store that can be read from, written to, and updated.

Local stores are NMVe powered so you get the best performance combination possible. Our sub-second data-store synchronization ensures consistency of your files cross region.

Web Performance and Content Distribution

Performance is never an after-thought. This is why our solution comes fully integrated with a complete Content Distribution Network (CDN). And, again, we can only pick the most reliable and performant CDN. And that's Google's cloud CDN. We use Google's globally distributed edge points to cache HTTP(S) content at locations that are the closest to your users.

But performance fundamentals are crucial for server-side processing especially when dynamic content is involved. So a CDN-only approach doesn't cut it. The backend servers are powered by NVMe drives in a RAID-10 configuration. The compute node is powered by the latest Intel multi-CPU multi-core server processors on server hardware. Services are well configured and fine-tuned to process requests in the most efficient manner there is. Varnish is one click away should you need to further accelerate and cache pages and static assets in-memory.
Cachoid

What is so special about this solution?

The Web is ripe with firms that offer failover and/or load balancing or some kind of clustered solution. But upon close inspection, it is merely a VPS on a stick running on a dedicated server claiming to be part of a cloud. Our solution is different and we've laid out the technical details so you can independently research the subject. Our quest for a truly uptime conscious solution was born out of the need to care for our clients' online businesses. And we, UNIXy, are putting our reputation on the line.

Caveats and Limitations

For the sake of transparency and honesty, we must set the right expectations. There are situations that auto-failover and global load balancing were not designed to mitigate. Although we covered all possible scenarios, we cannot assume responsibility for changes you make that break your Website. For example, it's very likely that an upgrade of your Website's code or database can result in an error. Until you fix the error or roll back to a previously-working version, your Website will show errors. We'd be more than happy to advise on implementing a change management procedure so as to minimize the impact of such inadvertent changes on your production environment.

In order to maximize uptime, an organization needs to adopt best-practices and change management procedures. These are guidelines for properly introducing changes to your production environment. There are other situations that are outside our control that may seem obvious but which we must state the obvious for the sake of clarity and transparency. Either way, this is a truly fully managed solution: we're here to help you should just want things to work and don't want to fiddle with things.
Client-code software bug
Client-initiated code change
Accidental client-deletion of files
Large-scale DDoS attack
Non-follow of procedures
Billing account issues


Nginx
Litespeed
varnish