We are delighted to see that the Flink community has announced the availability of Apache Flink 1.0. This release is one of the largest Flink releases ever, with about 64 individuals resolving more than 450 JIRA issues, and, most importantly, marks the beginning of the Flink 1.x.y series, which initiates backwards compatibility for all minor releases moving forward. We see this release as the most important milestone in the project since Flink graduated from the Apache Incubator one year ago. Additionally, we see this release as (1) validating production-readiness for Flink, and (2) significantly pushing the envelope in stream processing with features that are unique in the open source world.
Production readiness
While the needs of production users can widely vary, Flink by now covers all points needed to back typical production data applications with a smooth operational experience.
- Backwards compatibility: Flink 1.0 removes the hurdle of changing the application code when Flink releases new versions. This is huge for production users who want to maintain their business logic and applications while seamlessly benefiting from new patches in Flink.
- Operational features: Flink by now boasts very advanced monitoring capabilities (this release adds backpressure monitoring, checkpoint statistics, and the ability to submit jobs via the web interface). This release also adds savepoints, an essential feature (and unique in the open source world) that allows users to pause and resume applications without compromising result correctness and continuity.
- Battle-tested: Flink is by now in production use at both large tech and Fortune Global 500 companies. A team at Twitter recently clocked Flink at 15 million events per second in a moderate cluster.
- Integrated: Flink has always been integrated with the most popular open source tools, such as Hadoop (HDFS, YARN), Kafka (this release adds full support for Kafka 0.9), HBase, and others. Flink also features compatibility packages and runners, so that it can be used as an execution engine for programs written in MapReduce, Apache Storm, Cascading, and Apache Beam (incubating).
The Flink community has also put out a roadmap for future releases, which include SQL on data sets and data streams, more CEP functionality, on-the-wire encryption and other security features, dynamic scaling of running programs, integration with Apache Mesos, and YARN elasticity.
Pushing the state of open source streaming forward
Flink has for a long time pioneered several concepts in the open source data streaming world. The previous release of Flink (Flink 0.10) introduced flexible windows and triggers as well as window joins, and support for out of order streams. The release before that (Flink 0.9) had introduced for the first time low overhead exactly-once guarantees for state updates using a novel checkpointing algorithm.
The community continues to push the envelope with unique features that make it easier to both program and administer Flink applications. For example, the 1.0 release of Flink includes the first version of a CEP library. Complex Event Processing has been one of the oldest and more important use cases from stream processing. The new CEP functionality in Flink allows you to use a distributed general-purpose stream processor instead of a specialized CEP system to detect complex patterns in event streams.
Another great user-facing feature that makes it much easier to administer running Flink jobs is savepoints, which we have written about before. Savepoints make it possible to seamlessly upgrade application code, Flink versions, migrate applications across clusters, perform cluster maintenance, run A/B tests and what-if simulations, all while maintaining the consistency of the produced results.
If you are looking for a stream processing solution, download Apache Flink 1.0.0 and check out the documentation. Feedback through the Flink mailing lists is, as always, very welcome. We are looking forward to working together with the community for the next Flink releases across the 1.x.y series!