Introducing Apache Ignite, an open source dark horse...
Ignite is increasingly underpinning production workloads
There are plenty of Apache Software Foundation projects powering enterprise scale applications, but Apache Ignite feels like a dark horse: a powerful high-performance database and more, underpinning workflows at some of world’s largest companies – and on the cusp of a major new release (caution: breaking changes incoming...)
Judging from the lack of noise about Ignite and the low profile of GridGain Systems, the project’s donor organisation (it donated the project in 2014) and still the primary contributor, the IT world may be missing a trick.
Here’s The Stack’s quick-and-dirty answers to the questions what is Apache Ignite and how is Apache Ignite used?
So, what is Apache Ignite?
Ignite is a distributed database management system designed for high-performance computing but also used to underpin enterprise applications. At its simplest level it is a RAM-first distributed cache.
Apache Ignite can be used to power in-memory apps, as a cache, or an in-memory database, or datagrid sitting between an application and third-party databases however. It uses hardware RAM as a caching and storage layer, allowing it to ingest and process complex datasets in parallel and at faster speeds than traditional databases supported by only disk storage. Visibility into Apache Ignite nodes is available from vendors like DataDog.
It offers very low latency and very fast read performance – up to 1000x better than disk-based systems (although Ignite has native persistence, and can use disk storage, particularly for large datasets which don’t fit in memory). It can also scale indefinitely, meaning Ignite can handle extremely large datasets, and can be extremely resilient and fault-tolerant. Ignite offers integration with existing databases and data stores, along with SQL and ACID transactions – with some limitations. These are set to be addressed in Ignite 3.0, about which more below.
Headline users of Apache Ignite include Netflix, Microsoft, Apple, Bloomberg and PayPal.
User experience is somewhat mixed and there's a steep learning curve still, but the community (278 committers on GitHub, where the Apache 2.0 licensed project has 4.2k stars and 1.8k forks) is working to improve UX.
How is Apache Ignite used?
Enterprises use Ignite when they need speed and often when they want to build predictive machine learning models without having to worry about costly data transfers. Among its many uses, Apache Ignite can be deployed as a high-performance compute cluster that lets users turn a group of commodity machines or a cloud environment into a distributed supercomputer of interconnected Ignite nodes that process records in memory and reduce network utilisation with APIs for data and compute-intensive calculations.
During last month’s Ignite Summit, Expedia developers Bhanu Choudhary and Rohit Goel explained how they used Ignite as an in-memory cache for flight-search data, which was previously held and queried in Apache Cassandra: "Flights search is traditionally a read-heavy system with peak search traffic going regularly in the range of thousands of transactions per second. There is a latency overhead associated with each request which goes all the way down to the supplier. So, it is natural to use some sort of short duration caching of requests to provide the best experiences to our users,” said Goel and other Expedia developers in a blog post.
Even though Cassandra was fast, it wasn’t fast enough.
“The overall response time of the flight search service was more than 3 seconds to display results from the cache to the user. This is not ideal for any cached response and somewhat defeats the purpose of caching.”
Along with making other improvements, which shaved about 30% from the query time, Expedia turned to Ignite, deploying it as a cache for Cassandra. Through Ignite and some careful optimisations, Expedia was able to reduce response latency to less than 150ms – a small fraction of the time these queries had taken before.
See also: The Big Interview: Expedia Group CTO Rathi Murthy
Motorola Solutions, which provides communications equipment to emergency services, law enforcement and other critical sectors, also uses Apache Ignite – in this case to facilitate fast connections between users of its Unified Communications platform. Motorola operates the platform with 99.999% availability, meaning permitted downtime is just over five minutes a year.
Harish Negalaguli, senior principal architect for advanced technology at Motorola Solutions, explained to the Ignite Summit the firm had gone from an Active/Standby model with scaling limited to two data centres, to an Active/Active model with unlimited clusters and data centres with Ignite.
“The in-memory aspect of the database is very important. We need to have a certain SLA to meet on the call setup time that requires very low latency, so we cannot afford to have the data in the data in the cold storage or hard disk or SSD – we need a solution where any required data is immediately available in memory,” said Negalaguli.
Motorola now uses GridGain’s commercial Ignite implementation as a distributed in-memory cache, he added: “It provides local cache to access the memory, and any optimisation required for the application. It also allowed us to port our existing SQL application, and it allowed us to scale horizontally, instead of the vertical scalability problem we had with our previous model.
“So we can scale it horizontally in our database layer, and that automatically provided reliability scaling – and most importantly multi-site data replication.”
Negalaguli said the GridGain Ignite implementation not only allows scaling beyond two data centres, but also selective replication: “It's not like everything in data centre one needs to be replicated to data centre two or three and all the data centres.”
Other use cases discussed at the Ignite Summit were in quantitative analytics, as a payment card processor, and to expose legacy mainframe data. The full conference playlist is available here.
What is Apache Ignite’s roadmap?
The current stable Ignite release is 2.13, but GridGain and the Ignite community are hard at work on the next major version of Ignite: 3.0. In progress since 2020, 3.0 is a major revision of the Ignite codebase – which will, however, break compatibility with previous versions. The primary reason for this break is to rid the project of technical debt, and to improve and simplify the Ignite experience for new users, according to Nikita Ivanov, CTO and co-founder of GridGain, and one of the creators of Ignite. He told the Ignite Summit the move to 3.0 would take the project back to its roots, and cited his early inspiration for Ignite.
“I was pretty [well] versed in high performance computing existing technologies, and they had this tinge of, you have to have a PhD to use something like this. And that really bothered me, because fundamentally the ideas behind HPC were fairly simple – it's your typical parallel processing – and I wanted to marry this with in-memory computing,” said Ivanov.
“I fundamentally believe we need to renew focus on the first-touch basics, and this is a very key phrase; we need to make sure that not only can inexperienced users really benefit from Ignite, but people who just downloaded Ignite and wanted to play with it in the evening to just see the examples – they have to be efficient and productive.”
Key changes in Ignite 3.0 are a schema-first approach, which will eliminate a lot of confusion and unpredictable behaviour; dynamic configuration, eliminating the need to restart a cluster in the case of configuration changes; and a top-level implementation of a SQL API, which will improve compatibility with SQL use-cases.
Currently Ignite 3.0 is in alpha – with Alpha 5 released in June 2022. Valentin Kulichenko, project management committee member for Ignite, and previously director of project management at GridGain, told the Ignite Summit the aim was to have general availability of Apache Ignite 3.0 by the end of 2022.
Given Ignite offers a potential solution to the problem of handling big data at high speed with high reliability – and is under active development – now might be a very good time to investigate the potential of the platform.