Airflow Xcom ⇒
An XCom record is uniquely identified and isolated using a combination of specific metadata attributes: dag_id : The identifier of the containing workflow. task_id : The specific task that generated the data.
metadata database (SQLAlchemy-backed). While this makes data retrieval seamless, it also introduces a critical constraint: XComs are not designed for "big data." Because the information lives in the relational database, passing multi-gigabyte DataFrames can lead to severe performance degradation or database crashes. Evolution: From Manual to TaskFlow In earlier versions of Airflow, users had to explicitly call airflow xcom
@task def extract(): return "user_id": 42, "name": "Alice" # Auto-pushed as 'return_value' An XCom record is uniquely identified and isolated
@task def process_data(data_dict): # Airflow automatically pulls the XCom from extract_data here print(f"Processing data_dict['user']") return data_dict['value'] * 2 While this makes data retrieval seamless, it also
In Airflow, tasks run in separate contexts—often on different workers or at different times. While this isolation improves fault tolerance and scalability, it raises a challenge: how can one task pass a value (e.g., a file path, a row count, or a model accuracy score) to a downstream task?
: Standard database records do not auto-delete. Set up a periodic maintenance DAG that runs a clean-up query against the xcom database table to purge entries older than your history retention policy (e.g., 30 days). If you are currently designing a data pipeline, tell me: