Starter Page

PyFlink Starter Archetype

A minimal path from article to working scaffold

A simple scaffold and agent prompt for readers who want to move from the PyFlink article into a small, Python-first streaming project without guessing the first few structural decisions.

Why This Exists

The hardest part of trying a new streaming stack is often not the API. It is deciding what the first non-chaotic project shape should look like. This starter is intentionally small and biased toward learning the runtime model early.

Open These First

Suggested Structure

pyflink-starter/
  README.md
  pyproject.toml
  flink_jobs/
    __init__.py
    job.py
    transforms.py
    model_logic.py
  tests/
    test_transforms.py
  docker/
    Dockerfile
  conf/
    local.env

Local Setup Commands

python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install apache-flink==2.2.0

# optional: verify the install
python -c "import pyflink; print('pyflink ok')"

At the time of writing, the stable installation docs require Python 3.9, 3.10, 3.11, or 3.12.

Local Runtime Option

docker network create flink-net

docker run -d --name jobmanager \
  --network flink-net \
  -p 8081:8081 \
  -e JOB_MANAGER_RPC_ADDRESS=jobmanager \
  flink:2.2 jobmanager

docker run -d --name taskmanager \
  --network flink-net \
  -e JOB_MANAGER_RPC_ADDRESS=jobmanager \
  flink:2.2 taskmanager

This is the quickest way to get a local Flink runtime without pretending the cluster side does not exist.

What To Build First

  • Start with one small streaming job that reads a source, applies stateful logic, and emits a sink.
  • Keep the model logic replaceable so you can compare native Python execution with a service boundary later.
  • Prove local packaging, dependency shipping, and replay behavior before chasing throughput.

First Job Command

python flink_jobs/job.py

# once you move to a real cluster, keep dependency shipping explicit
# and treat Python files / requirements / connector JARs as part of the job

The dependency-management docs matter early because PyFlink becomes operationally confusing exactly when teams treat packaging as an afterthought.

What The Archetype Should Prove

  • A small DataStream job can run locally without magical hidden state.
  • Python dependencies are explicit and can be shipped deliberately.
  • Connector JARs are treated as runtime dependencies, not forgotten later.
  • The model logic can stay native Python at first, but the boundary can still be changed later.

Agent Prompt

Use the pyflink article as context and create a minimal PyFlink starter project.

Requirements:
- Python-first project layout
- one small DataStream job
- local runnable setup
- clear dependency management
- one testable transform module
- notes on where Java/JAR dependencies still enter the picture
- include a Docker-based local runtime option
- include a README with exact commands
- include one path for native Python model logic and one note on how to swap to a service boundary later

Do not optimize for scale yet.
Optimize for clarity, packaging sanity, and understanding the runtime boundary.

Useful Tooling

  • Python venv to keep the client-side environment isolated.
  • Docker to make the cluster side visible early instead of delaying it.
  • PyCharm / IntelliJ plus the official PyFlink debugging flow if you want local or remote Python UDF debugging.
  • Kafka only when your first example actually needs it; otherwise start with a smaller source/sink path and add connector JARs later.