5 Limitations of MySQL with Big Data

MySQL is a widely used open-source relational database management system (RDBMS) and an excellent solution for many applications, including web-scale applications. But the MySQL architecture has limitations when it comes to big data analytics.

A closer look at the strengths and weaknesses of MySQL reveals five use cases where the RDBMS, powerful though it is, can benefit from the many features of Apache Ignite.

1. Delivering Hot Data

In large applications, the data cache stored in RAM can grow very large and be subjected to thousands or even millions of requests per second. MySQL does not have a strong memory-focused search engine. Because it is not designed for very high concurrency, users can experience performance impacts from bottlenecks. Other limitations: MySQL is saddled with relatively high overhead and cannot deliver optimal speeds.

It is common to use either Memcached or Redis as external solutions for caches and other hot data needs, to eliminate the overhead of SQL parsing and transactions. But caching is difficult because you risk reading data that is not current. It is also possible to use internal scalability improvements in MySQL, such as the thread pool, available either as a MySQL enterprise feature or as an open source implementation. But such features typically help many queries to run concurrently, without improving the performance of any given query.

A better solution is to use an Apache® Ignite™ in-memory computing platform that includes an in-memory data grid and an in-memory SQL grid that enable you to create an in-memory MySQL data store to partition and cache data in memory and to scale it across distributed clusters. Unlike products like Redis and Memcached, which are primarily in-memory key-value stores, the Apache Ignite in-memory computing platform has many additional features that are highly valuable for companies moving to in-memory computing. For example: ANSI-99 SQL, distributed ACID-compliant transactions, advanced computing and streaming capabilities, and more.

2. Dealing with Highly Volatile Data

If thousands of updates per second are applied to a single database row (e.g., flash online sales for high-demand concert tickets), it is crucial to maintain exact values every second. MySQL is designed based on full transactional semantics with support for long transactions. It works with disk-based log durability. Therefore, MySQL is not well suited for use with this type of highly volatile data environment.

To some extent, good data design within MySQL can address this issue. Splitting the counter over several rows can help. Optimal configuration of the MySQL installation can yield up to 10x better performance than a stock MySQL configuration. Parallel replication, another historical problem, has been addressed in MySQL 5.7, or by solutions such as Percona XtraDB Cluster. Some users even move frequently-updated data to Memcached or Redis, and then periodically synchronize to the RDBMS.

A better solution is to use Apache Ignite to automatically distribute data across all nodes in a cluster. Replication between nodes and clusters is configurable and takes place automatically. Apache Ignite can be configured to provide the needed level of consistency. Once the data is in Apache Ignite, you can leverage the ANSI-99 distributed SQL engine and ACID-compliant transactions in order to work with your data in the same way you work with it in MySQL.

3. Avoid MySQL Scalability Limitations

MySQL was originally designed as a single-node system and not with the modern data center in mind. Today's largest MySQL installations cannot scale by using MySQL as a single system and must rely on sharding — or splitting a data set over multiple nodes or instances. However, most sharding solutions in MySQL are manual and make application code more complex. Any performance gain is lost when queries must access data across multiple shards.

Tools such as Vitess (a framework YouTube released for MySQL sharding) or ProxySQL are often used to help implement sharding. Memcached, Redis, MongoDB, and Cassandra also offer front-end solutions.

Apache Ignite was built from the ground up as a high performance and highly scalable distributed in-memory computing platform. There are no limitations to the amount of CPU and memory that can be used by any node. Furthermore, nodes are automatically load balanced. Data is automatically distributed across all nodes in a cluster so manual sharding is not necessary.

4. Providing Analytics

MySQL was not designed for running complicated queries against massive data volumes (which requires crunching through a lot of data on a huge scale). MySQL optimizer is quite limited, executing a single query at a time using a single thread. A given MySQL query can neither scale among multiple CPU cores in a single system nor execute distributed queries across multiple nodes.

Although MySQL offers no comprehensive solution for robust, large-scale data processing, many external solutions, such as Hadoop® and Apache® Spark™, are often used. Vertica and ClickHouse have also emerged as worthy analytics solutions.

Here too, Apache Ignite is a better solution. It easily integrates with Hadoop and Spark, using in-memory technology to complement these technologies and achieve significantly better performance and scale. The Apache Ignite in-memory SQL grid is highly optimized and easily tuned to execute high performance SQL queries. The in-memory SQL grid provides JDBC/ODBC connections to any analytics solution.

Talk to GridGain, the world's expert in Apache Ignite and in-memory computing. Wherever you are on your Ignite journey, we're here to help.

Contact Us

5. Powering Full Text Searches at Scale

MySQL can handle basic full text searches. However, because of its inability to manage parallel processing, searches do not scale well as data volumes increase.

For small-scale search applications, InnoDB, first available with MySQL 5.6, can help. When the search spans beyond a few gigabytes, however, an advanced search solution such as Elasticsearch, Apache Solr, Crate.io or Sphinx Search makes sense.

Or you can use Apache Ignite, which supports text-based queries based on Lucene indexing. Text searches are distributed across the cluster for parallel processing against data stored in memory. This results in high performance and highly scalable full text searches.

Two Major Trends

Two major trends have converged to change the landscape of large-scale data processing. Together they are forcing a wholesale change in the database world.

  • Real-time data processing is increasingly being deployed by enterprise organizations of all types, leading them to in-memory databases and distributed architectures.
  • Memory prices have dropped significantly in the past few years. 1TB of server-class RAM costs $4 per GB in 2022, down 50% from $8 per GB just 3 years ago. 1 GB of server-class disk storage remains in the 10¢ per GB range. The rapid decline in RAM prices relative to disk storage makes in-memory processing and analytics appealing to enterprises for a growing number of high-value workloads.

In-memory platforms are the most sophisticated, efficient, and well-architected distributed systems in existence. The largest implementations of Apache Ignite rival the largest installations of Cassandra and Hadoop. They include several thousand nodes and clusters – while providing unrivaled performance and scalability.

All of this means in-memory computing platforms can provide the scalability you need to overcome MySQL limitations. At some point, in the not-too-distant future, advances in memory technology may lead us to a world of in-memory storage as well, with disks serving only as historical backups.