Flink Forward 2015 was the inaugural conference around the Apache Flink community and took place at the beautiful Kulturbrauerei in Berlin, a former brewery turned into a fantastic event space. Overall, we are both delighted and overwhelmed by the success of the conference, and how the Flink community is rapidly growing and connecting.
Around 250 participants had the opportunity to attend a total of 33 technical talks (organized in 2 parallel sessions), as well as participate in free Flink trainings. The talks had been selected by a program committee comprising of five Flink PMC members: Marton Balassi, Stephan Ewen, Vasia Kalavri, Henry Saputra, and Kostas Tzoumas.
The slides of all the talks are available online. Video recordings will be added soon.
Flink Forward featured two keynotes, one by Kostas Tzoumas and Stephan Ewen from data Artisans, and one from William Vambenepe from Google. Both keynotes emphasized how stream processing is more than just “fast data”, but, really, a new paradigm for programming data-intensive applications which embraces the unbounded and continuous nature of data as it is produced in the real world. Instead of ignoring or implicitly managing continuous data using batch frameworks or hybrid (lambda) architectures, developers can now use modern stream processing frameworks to get timely answers from their data. Since the value of data is directly correlated to its freshness, a streaming-first infrastructure increases the actual value of the insight we get from our data signals. Both keynotes asserted that stream processing technology has now matured, citing Apache Flink (with an emphasis on the upcoming Flink 0.10 version) and Google Cloud Dataflow, two systems that share a lot in common, and two communities that are closely cooperating to provide compatibility between the two frameworks.
The movement towards stream processing was very visible in the conference, with several talks dedicated to how companies put Apache Flink to their production environments for real-time data processing, as well as talks on how the Flink framework internally treats streaming data:
Mohamed Amine Abdessemed (Bouygues Telecom): Real-time data integration with Flink & Apache Kafka
Ignacio Mulas Viela (Ericsson): Applying Kappa architecture in the telecom industry
Anwar Rizal (Amadeus): Implementing Streaming Decision Tree Using Approximative Algorithms in Flink
Christian Kreuzfeld (Otto Group): Static vs Dynamic Stream Processing
Alexander Kolb (Otto Group): Flink? Yet another streaming framework?
Marton Balassi (Hungarian Academy of Sciences): Stateful Stream processing
Till Rohrmann (data Artisans): Fault Tolerance and Recovery of Flink Jobs
Aljoscha Krettek (data Artisans): Notions of Time – How Apache Flink Handles Time and Windows
Assaf Araki (Intel): Real Time Analytics at Scale – Smart Data Pipes for the Internet of Things
Matthias Sax (HU Berlin): A tale of Squirrels and Storms
Maximilian Michels (data Artisans): Google Cloud Dataflow on top of Apache Flink
Ufuk Celebi (data Artisans): Stream and Batch Processing in One System — Apache Flink’s Streaming Data Flow Engine
Albert Bifet (Huawei): Apache SAMOA: Mining Big Data Streams with Apache Flink
Stream processing is, of course, only one of the things that people are doing with Flink. Michael Häusler from ResearchGate (rightly) argued that batch is not dead. He shared ResearchGate’s methodology to choose a framework that makes simple things easy by comparing solutions to solve a simple task. Other talks that focused on evaluation of different systems and performance were:
Fabian Hueske (data Artisans): Cascading on Apache Flink
Slim Baltagi (Capital One): Flink and Spark: Similarities and Differences
Dongwon Kim (POSTECH): A comparative performance evaluation of Flink
Christopher Hillman (University of Dundee): Beyond MapReduce, Scientific data processing in real-time
Vyacheslav Zholudev (ResearchGate): Flink – a convenient abstraction layer for YARN?
Another focus of the conference was Machine Learning, interactive analytics, graph processing, and integration of Flink with other pieces of the Big Data infrastructure:
Vasia Kalavri (KTH): Automatic Detection of Web Trackers at Telefonica Research
Mikio Braun (Zalando): Procedural Programming vs. Data Flow
Martin Junghans (University of Leipzig): Gradoop: Scalable Graph Analytics with Apache Flink
Moon soo Lee (NFLabs): Data science lifecycle with Apache Flink and Apache Zeppelin
Sebastian Schelters (TU Berlin): Declarative Machine Learning with the Samsara DSL
Stefano Bortoli & Flavio Pompermaier (OKKAM): A Semantic Big Data Companion
Kamal Hakimzadeh (KTH): Karamel – Reproducing distributed systems and experiments on cloud
Jim Dowling (SICS): Interactive Flink Analytics with Hopsworks and Apache Zeppelin
Nam-Luc Tran (Euranova): Stale Synchronous Parallel Iterations on Flink
Romeo Kienzler and Simon Laws (IBM): Apache Flink Cluster Deployment on Docker using Docker-Compose
Suneel Marthi (RedHat): BigPetStore: A Comprehensive Blueprint for Apache Flink
Marc Schwering (MongoDB): Using Flink with MongoDB to Enhance Relevancy in Personalization
The training sessions were very well attended, with hands-on training on Flink’s DataStream, DataSet, and Gelly APIs. As always, you can access the latest data Artisans training material for Apache Flink online.
Of course, not everything about Flink Forward 2015 was perfect. As this was the first installment of the conference, glitches like spotty internet or mix-ups with registration did happen at times, but were quickly resolved. A more important issue that we noticed is the lack of diversity in the community, and especially the very small percentage of women speakers and attendees. While this is an issue of the tech community at large and not just specific to the Flink community, we would like to start an open discussion about it as early as possible and see how we can improve in the future.
What next?
Browse all slides and videos of Flink Forward 2015 here, and see other reviews of the conference here and here. See you at Flink Forward 2016!