Developers, especially web developers at startups, have shared a huge amount of knowledge over the years about the best, most reliable, and most scalable ways to run apps and databases in production. This knowledge lives on High Scalability, StackOverflow, Hacker News, and on thousands of engineering blogs and podcasts.
Even though these best practices are widely known and easy to learn about, most users don’t employ them. Unfortunately the best practices are often difficult and time consuming to set up and manage, especially today when most companies have an ever-expanding set of microservices and databases.
Best practices are often developed and publicized by the biggest, best funded companies who can afford to, and in many cases have to spend the time and expense required to discover and implement new and better ways to use technology.
Many of the most valuable, and tricky to implement best practices relate to databases. Databases should, wherever possible, be highly available, fault tolerant, and automatically fail over while maintaining data consistency. We include database appliances in Flynn designed to do just that.
In the case of relational databases, database instances should be set up using synchronous replication. This means that when a write comes in from a client it isn’t confirmed until the write has been copied to a database replica. That way data is consistent even if a database instance fails. If you imagine that write representing a record of a payment or purchase, you get a sense of how dangerous inconsistent databases are.
While testing our MariaDB/MySQL database appliance recently we discovered evidence that most users who set up MariaDB on their own aren’t taking advantage of the built-in tunable replication capabilities to prioritize data consistency. The bug we found would cripple a database cluster if the user had tuned the built-in semi-synchronous replication to be consistent. This bug would have been apparent to any user who employed best practices for relational databases. Google’s internal version of MySQL, for example, fixed this bug more than two years ago. The fact that it had not been fixed in master suggests that very few users of MariaDB are optimizing their clusters for data consistency.
We think everyone deserves Google-quality infrastructure, but we understand that it just takes too much time and effort to first learn about and then set up many complex systems. That’s why we built Flynn.
We try to employ the best known practices in every element of Flynn:
- All Flynn components are highly-available.
- Application deploys on Flynn are zero downtime.
- It’s easy to rollback broken application deploys.
- Full cluster backups are available with a single command.
- Every component and application process runs inside its own container with separate resource limits.
Flynn is designed to give everyone the more reliable, resilient infrastructure in the easiest to use package possible. Flynn will continue to evolve with the state of the art and industry best practices so your cluster gets more powerful and reliable with each release. We will continue to work with upstream vendors and communities to make sure the technologies we rely on, like for example the databases in our appliances, also live up to those standards.