Flink’s major version releases occur every few months, and there’s a constant stream of activity as new features are merged to the Flink master branch in between releases. Keeping an eye on what’s going into Flink’s master is one of the best ways to stay up-to-date on new work that hasn’t yet made it into an official release.
So we’re going to try something new this year: a “Flink Master Monthly” blog post where we highlight a selection of features that have been merged into Flink’s master branch during the past month. If you’re interested in trying out these features out once they’re in master, you can certainly do so–keeping in mind that they haven’t yet been fully tested until they go through the official release process.
If you’d like to see a full list of newly-merged features from a given time period, Git is your friend. You can simply run the following:
Improvements to Flink’s deployment and process model (FLIP-6): FLIP-6, a major rework of Flink’s deployment and process model in order to improve integration with YARN, Mesos, and container managers (e.g. Docker & Kubernetes), is nearing completion. This FLIP will clear the way for oft-requested features such as dynamic scaling, among other things. The following issues were merged in January during the home stretch for FLIP-6.
While windowed inner joins were added in the last release, Flink now also supports all types of windowed outer equi-joins [FLINK-7797]. Queries like the one shown below allow for joining in a bounded time range in both event-time and processing-time.
Flink also now supports more SQL auxiliary functions such as MD5, SHA1, SHA256, BIN, LPAD, and RPAD [FLINK-6810].
For upcoming features such as a SQL Client that enables executing Flink jobs in a non-programmatic way, we made efforts to define table sources in a consistent, string property-based way [FLINK-8240]. This efforts are not done yet but will make it easier to retrieve tables from external systems and will make the table source discovery more modular.
Ecosystem integrations: OpenStack provides software for creating public and private clouds on pools of resources, and this feature was motivated by a user who runs Flink on OpenStack and wanted to be able to use OpenStack’s S3-like filesystem, Swift, for checkpoint and savepoint storage and to do so without Hadoop dependencies.
So we’re going to try something new this year: a “Flink Master Monthly” blog post where we highlight a selection of features that have been merged into Flink’s master branch during the past month. If you’re interested in trying out these features out once they’re in master, you can certainly do so–keeping in mind that they haven’t yet been fully tested until they go through the official release process.
If you’d like to see a full list of newly-merged features from a given time period, Git is your friend. You can simply run the following:
git shortlog -e --since="01 Jan 2018" --before="01 Feb 2018"The raw list offers quite a lot to sift through, so here’s our brief summary.
Improvements to Flink’s deployment and process model (FLIP-6): FLIP-6, a major rework of Flink’s deployment and process model in order to improve integration with YARN, Mesos, and container managers (e.g. Docker & Kubernetes), is nearing completion. This FLIP will clear the way for oft-requested features such as dynamic scaling, among other things. The following issues were merged in January during the home stretch for FLIP-6.
- [FLINK-7903] [tests] Add flip6 build profile
- [FLINK-7904] Enable Flip6 build profile on Travis
- [FLINK-8453] [flip6] Add ArchivedExecutionGraphStore to Dispatcher
- [FLINK-8299] [flip6] Retrieve JobExecutionResult after job submission
- [FLINK-7720] [checkpoints] Centralize creation of backends and state related resources
- [FLINK-5823] Store Checkpoint Root Metadata in StateBackend (not in HA custom store)
- [FLINK-8531] Support separation of “Exclusive”, “Shared” and “Task owned” state
- [FLINK-7520][network] let our Buffer class extend from netty’s buffer class
- [FLINK-7427][network] integrate PartitionRequestProtocol into NettyProtocol
- [FLINK-8375][network] Remove unnecessary synchronization
- [FLINK-7406][network] Implement Netty receiver incoming pipeline for credit-based
- [FLINK-7416][network] Implement Netty receiver outgoing pipeline for credit-based
- [FLINK-7468][network] Implement sender backlog logic for credit-based
- [FLINK-8490] Allow custom Docker parameters for Docker tasks on Mesos
While windowed inner joins were added in the last release, Flink now also supports all types of windowed outer equi-joins [FLINK-7797]. Queries like the one shown below allow for joining in a bounded time range in both event-time and processing-time.
SELECT Table1.uid, Table2.event FROM Table1 LEFT OUTER JOIN Table2 ON Table1.uid = Table2.uid AND Table1.rowtime BETWEEN Table2.rowtime - INTERVAL '10' SECOND AND Table2.rowtime + INTERVAL '1' HOURFor cases where two streams should not be joined within a bounded interval, we also merged non-windowed inner joins [FLINK-6094]. They allow for performing full-history matching, as it is common in many standard SQL statements.
SELECT Table1.uid, Table2.name FROM Table1 JOIN Table2 ON Table1.uid = Table2.uidIn the past, it was sometimes cumbersome to convert a DataStream or DataSet into a Table. We improved this by allowing a schema definition both based on name or on index [FLINK-8203].
Flink also now supports more SQL auxiliary functions such as MD5, SHA1, SHA256, BIN, LPAD, and RPAD [FLINK-6810].
For upcoming features such as a SQL Client that enables executing Flink jobs in a non-programmatic way, we made efforts to define table sources in a consistent, string property-based way [FLINK-8240]. This efforts are not done yet but will make it easier to retrieve tables from external systems and will make the table source discovery more modular.
Ecosystem integrations: OpenStack provides software for creating public and private clouds on pools of resources, and this feature was motivated by a user who runs Flink on OpenStack and wanted to be able to use OpenStack’s S3-like filesystem, Swift, for checkpoint and savepoint storage and to do so without Hadoop dependencies.
- [FLINK-8432] Add support for OpenStack’s Swift filesystem
- [FLINK-6590][docs] Integrate generated tables into documentation
The post Apache Flink® Master Branch Monthly: New in Flink in January 2018 appeared first on data Artisans.