GridGain recently published a micro-learning unit on GridGain University that delves into the different data loading strategies for Apache Ignite. These strategies include initial/regular batch load from files, database loading, real-time streaming, and ETL (Extract Transform & Load).
Here is an overview of some of the key strategies. Watch the full 9-minute video here.
Initial / Regular Batch Load from Files
Flat files – such as CSV, TSV, and Parquet – are commonly used for loading batch data into an Ignite cluster. This strategy is suitable for scenarios like loading data for reporting from the previous day's activities or loading static reference data. Flat files are portable across different operating systems, but be cautious of character encoding and code page differences.
Database loading is commonly used when the system of record is an RDBMS or a NoSQL database. Apache Ignite can integrate directly with RDBMS using JDBC drivers, allowing you to load an entire table into Ignite. It also supports NoSQL databases like Apache Cassandra and others through custom cache store implementation. Data can be loaded from 3rd party applications via API calls, either as an initial batch load or incremental loads.
Real-time streaming is an event-driven architecture approach for pushing data into Ignite. This strategy is suitable for scenarios where data is generated by live events, like stock market changes or IoT activities. Streaming data is continuously generated and processed incrementally, often used in big data contexts.
Apache Kafka is a common distributed event streaming platform that can be connected with Ignite using the Ignite Kafka streamer. Other streaming technologies compatible with Ignite include Apache Spark, Apache Camel, Apache Storm, Apache Flink, Apache Pulsar, Java JMS, Rabbit MQ, and MQTT-supported IoT devices.
ETL (Extract Transform & Load)
ETL refers to the practice of extracting data from one source, transforming it, and loading it into another target source. ETL can be implemented using tools/libraries or directly in Apache Ignite. It is useful for batch data that requires transformation before loading into Ignite. ETL can also be applied in an event-driven architecture to perform real-time data transformation using the Ignite DataStreamer. Custom or non-built-in data sources can be accessed through the DataStreamer API for efficient data insertion into Ignite Cache.
Check out this 9-minute micro-learning unit to better understand these different strategies and choose the best approach for loading data into Apache Ignite based on your specific requirements. Whether it's batch loading from files, integrating with databases, utilizing real-time streaming, or applying ETL, Apache Ignite provides flexibility and scalability for efficient data loading and processing.