Managed Auto-Failover with Load Balancing
Hard truth: no single-server is immune to down-time. Sooner or later the brunt of a hardware, network, or service
failure event will consume you to your very core. In the frenzy that is to contract and work with several data centers
in the US and EU, we can sympathize and further confirm that the threat of down events is imminent and real. So
real, it incurs real dollar losses that far often exceed the cost of hosting severalfold. If you're a business owner,
you very well know how much it costs you to be offline. And if you're a system administrator, you need more than two hands
to count the man-hours it took you to recover a down system.
Sympathy aside, you are here because you need a sound solution yesterday. You have come to the right place! Here is
the solution you've been needing all along -- fully managed -- no fluff and no puff! So what is it? Failover?
Load balancing? Replication? Redundancy? Shared nothing? Below are the elements that make up our minutious
cross-datacenter auto-failover and globally load-balanced solution.
Global Anycast Software-Defined Load Balancing
This is a global anycast software-defined load balancer with a single-IP front-ending your Website. It is NOT
physically tied to one
device or one instance, which is often prone to failure. It survives network and physical failure of the load balancer. This is how Google does it. In fact, we leverage Google's Global Anycast Cloud configuration
to bring you a robust, feature-rich, and resilient network. The load balancer determines the location of the visitor and picks
the closest available and healthy backend to handle the request. Our implementation comes with both auto-failover and load
balancing across two remotely-independent regions: east and west US coast.
The distance between the backend-regions is over 2000 miles so a major adverse event or an act of God will not affect the
health of your Website. A loss of power, network connectivity, or hardware components will NOT result in your Website being
down. Because the load balancer is front-facing it mitigates many attacks like SYN flooding, port exhaustion, IP fragment
floods, etc.
Database Replica with Auto-Failover
Every single point of failure needs to be addressed. The database being a critical one for dynamic websites, we ensure
replication and unattended auto-failover are standard. Our pattern of building upon the best tools and platforms continues:
we leverage AWS RDS and make MariaDB/MySQL available to you. Replication and auto-failover are fully managed by us
for you. There's no need for you to keep track of replication logs, monitoring, or recovery. This is all seamless for
all you care.
All you need to concern yourself with is the DB connection and credentials, which remain the same in good and bad times.
We have opted for enhanced availability and durability. It's synchronous physical replication across multiple availability
zones (AZ). In the event that a zone fails due to hardware, software, or network malfunction, your Website continues to
function.
Independently-Synchronized Data Store
A shared-nothing architecture is key to scalability and resilience. A shared volume, network filesystem, or shared directory
is often recipe for disaster as it's been proven over and over again. And so we share nothing between the backends. The two
Web servers see a local up-to-date file store that can be read from, written to, and updated.
Local stores are NMVe powered so you get the best performance combination possible. Our sub-second data-store
synchronization ensures consistency of your files cross region.
Web Performance and Content Distribution
Performance is never an after-thought. This is why our solution comes fully integrated with a complete
Content Distribution Network (CDN). And, again, we can only pick the most reliable and performant CDN. And
that's Google's cloud CDN. We use Google's globally distributed edge points to cache HTTP(S) content at
locations that are the closest to your users.
But performance fundamentals are crucial for server-side processing especially when dynamic content is
involved. So a CDN-only approach doesn't cut it. The backend servers are powered by NVMe drives in a RAID-10
configuration. The compute node is powered by the latest Intel multi-CPU multi-core server processors on server
hardware. Services are well configured and fine-tuned to process requests in the most efficient manner there is. Varnish
is one click away should you need to further accelerate and cache pages and static assets in-memory.
What is so special about this solution?
The Web is ripe with firms that offer failover and/or load balancing or some kind of clustered solution. But upon close
inspection, it is merely a VPS on a stick running on a dedicated server claiming to be part of a cloud. Our solution is
different and we've laid out the technical details so you can independently research the subject. Our quest for a truly uptime
conscious solution was born out of the need to care for our clients' online businesses. And we, UNIXy, are putting
our reputation on the line.
Caveats and Limitations
For the sake of transparency and honesty, we must set the right expectations. There are situations that auto-failover and
global load balancing were not designed to mitigate. Although we covered all possible scenarios, we cannot assume
responsibility for changes you make that break your Website. For example, it's very likely that an upgrade of your
Website's code or database can result in an error. Until you fix the error or roll back to a previously-working
version, your Website will show errors. We'd be more than happy to advise on implementing a change management procedure
so as to minimize the impact of such inadvertent changes on your production environment.
In order to maximize uptime, an organization needs to adopt best-practices and change management procedures. These
are guidelines for properly introducing changes to your production environment. There are other situations that are outside
our control that may seem obvious but which we must state the obvious for the sake of clarity and transparency. Either way,
this is a
truly fully managed solution: we're here to help you should just
want things to work and don't want to fiddle with things.
Client-code software bug
Client-initiated code change
Accidental client-deletion of files
Large-scale DDoS attack
Non-follow of procedures
Billing account issues