Comments (3)
We have an operator that transfers files from s3 to Hive, as well as a Hive to Mysql operator, which is unfortunately missing from the documentation. The hooks being already in place, a s3 to Postgres operator should not be too difficult. Here is an example of how our s3 to Hive operator would work :
from collections import OrderedDict
from airflow.operators import S3ToHiveTransfer
S3ToHiveTransfer(
task_id='s3_to_hive',
s3_key='s3://bucket-name/user_list_{{ ds }}.tsv',
field_dict=OrderedDict([("user_id", "BIGINT")
, ("first_name", "STRING")
, ("last_name", "STRING")
, ("registered_at", "TIMESTAMP")]),
hive_table='{{ params.db_name }}.user_list_{{ ds_nodash }}',
create=True,
recreate=True,
delimiter='\t',
s3_conn_id='s3_connection_name',
dag=dag)
Usually, the transfer is a separate operation of any transform and has its own operator.
I hope this helps.
Best,
Arthur
from airflow.
Agreed, the current tutorial is really just focused on the mechanics of Airflow with very foobar-y examples. I didn't want to write a pipeline that was too stack specific (MySQL / Hive / ...) and wanted to make sure it would work for anyone, regardless of the stack they might have.
Maybe using a SqliteOperator to do some analytics on some data scraped from the Internet would be a good example. It could be interesting to re-write the Luigi example for comparison :)
But yeah, it's on the TODO list.
from airflow.
Cool. I'll close this for now.
from airflow.
Related Issues (20)
- Update Databricks provider to depend on databricks-sql-connector >= 3.1.0 HOT 4
- [Bug]: Papermill Provider installed via pip seems empty HOT 2
- Create a Cloud Storage Operator that could return a list of objects in a folder HOT 2
- KubernetesPodOperator callback example from Doc doesn't work HOT 2
- webserver - acces task log | TypeError: '<' not supported between instances of 'Interval' and 'DateTime'
- FTPHook doesn't not allow to change port HOT 3
- Support Async Cursor for SQLExecuteOperator
- In provider docs, the environment variables in Configuration tab are shown incorrectly HOT 3
- Send email alert failed when task fail HOT 2
- Release helm chart for Airflow 2.9.0 HOT 2
- Missing termination argument in DockerOperator HOT 8
- Logs from elastic not visible in Grid menu HOT 5
- Status of testing of Apache Airflow 2.9.1rc2 HOT 7
- DAG disappearing from Airflow in case of standalone DAG processor HOT 1
- Create new Callback Types HOT 2
- New release covering latest main HOT 1
- webserver static file caching is disabled above 2.3.0 version. HOT 2
- Status of testing Providers that were prepared on May 01, 2024 HOT 24
- Mock all connections in `TestYandexCloudYqHook::test_select_results` HOT 3
- This function is deprecated. Please use `create_unique_id`. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from airflow.