3 Telegram Channels Worth Following for Production Data Engineering
I run a small Telegram channel about production scraping (t.me/scraping_ai). It is paused for content right now — too few subscribers to justify writing for an empty room — but I keep reading the broader data-engineering corner of Telegram every day. Over the last six months three channels kept earning my attention while most ad-driven “scraping” or “AI” lists faded. Here they are, with the kind of detail I wish a list like this gave me before I subscribed.
1. @dataeng — distributed systems with operational depth
t.me/dataeng · ~4 400 subscribers · English · channel format
Run by @adilkhash. The framing is “data engineering & distributed systems” but in practice it lives at the boundary where storage layouts, query engines and orchestration meet — exactly the layer I keep colliding with when I try to keep an Apify-based scraper pipeline from collapsing under its own retry storm.
Why it stays useful:
- Posts are written in the long-form essay style, not link-dumps. When a piece references a paper or a system (Iceberg, Snowflake’s micro-partitioning, etc.), it explains the trade-off rather than just naming the technology.
- The author is a practitioner — he has run real workloads, so the failure modes he highlights are the ones you actually meet, not the textbook ones.
- The cadence is moderate (1–2 posts per week), so the channel never adds noise to the inbox.
If your work touches Trino, dbt, Iceberg, or Spark on top of object storage, this is the single channel I’d keep over the loudest “data influencer” feeds.
2. @apache_airflow — orchestration, no fluff
t.me/apache_airflow · ~770 subscribers · channel format
Smaller and tighter than the previous one. The focus is exactly what the handle says: Apache Airflow as the orchestrator. No “10 hot AI tools” posts, no recycled medium articles.
Why I follow it even though I don’t run Airflow at the moment:
- A long tail of my Apify actors are eventually going to be wrapped as Airflow
PythonOperatororBashOperatortasks for clients who already standardised on Airflow. The patterns posted here — DAG-level retries with exponential backoff, sensor anti-patterns, the difference betweendepends_on_past=Trueandwait_for_downstream— translate directly to scraper schedules that need to survive overnight outages. - The volume of posts is low enough that you can read each one carefully. That matters when the topic is orchestration: missing one detail about catchup or backfill is the difference between a clean run and a re-execution storm.
- The signal-to-noise ratio is the highest in this list. Almost no promotional content.
A reasonable rule of thumb: if you orchestrate any long-running data job — scraping, ETL, training pipelines — the patterns published here pay for themselves within a quarter.
3. @bigdata_en — adjacent audience, different mechanic
t.me/bigdata_en · ~1 040 members · supergroup (chat) format
This is intentionally on the list even though it isn’t a broadcast channel. It is a chat where data scientists, ML engineers and big-data folks discuss whatever has come up in their week. The reason I read it is exactly that adjacency: scraping pipelines feed feature stores, training corpora and evaluation sets — and the people on the receiving side of that data have a different sense of what “good ingestion” means than the engineers who write the scrapers.
Specifics worth knowing before subscribing:
- It is a chat with active moderation (the title literally mentions an antispam bot). Lurking is fine; dropping links without context is not. Read for a few days before contributing.
- The same admin runs a Russian sibling group with around 5 400 members, which means the moderation philosophy and the topic boundaries are battle-tested across two communities.
- For a scraping engineer the value is asymmetric: you learn what downstream consumers complain about (drift in schema, late-arriving rows, encoding mistakes that survive serialization) — and you can fix it in your scraper one release earlier than you would have otherwise.
If your scrapers feed any kind of analytics or ML pipeline, treat this as your “downstream complaint stream” and adjust upstream accordingly.
What I deliberately left off the list
- Anything that is paid promotion dressed as a recommendation. Several “best Telegram channels for data engineers” lists circulate every few months. Most are sponsored placements. I have nothing against paid promotion when it is disclosed, but it has no place in a post titled “channels I read”.
- Channels that mostly forward each other. A surprising amount of Telegram data content is just the same five posts cycling through fifteen channels. The three above publish their own material.
- Russian-only channels. Some of them are excellent — but the audience for this blog is mostly English-speaking, and a list you can’t act on doesn’t help you.
How I read these (a small workflow note)
I keep these three muted in Telegram and check the channel list once in the morning while waiting for the kettle. Anything genuinely interesting goes into a “later” pin, anything immediately actionable gets a 1-line note in my work journal. Total time: under five minutes per day. The point of these lists is not to add another inbox to triage — it is to make sure the few signals you would actually act on don’t get lost in the broader feed.
If you have a fourth channel that fits this bar — operational depth, low promotional content, on-topic for production data work — I’d genuinely like to know. Email is in the footer.