Made from programs which can, but don't have to, connect to each other.
Before hammering a nail, get the right sized hammer.
Data has to be gathered, enriched, analysed and stored.
Apache Kafka, Apache Spark, Apache Hive and Redis seem like the right tools the job.
Here is how I came to that conclusion.
- a.k.a. Kafka Producer
- transports data from a CSV file to Kafka
- Python
- a.k.a. Kafka Consumer
- transports data from Kafka to Apache Hive
- Java and Maven
- takes data from Apache Hive and Redis
- combines data into a single object
- sends objects into Apache Spark for analysis
- stores results back into Apache Hive
- Java and Maven