Projects with this topic
-
End-to-end AWS data lake pipeline for fleet telemetry data using S3, Spark, and Athena. Includes partitioned Parquet ETL, vehicle safety analytics, and SQL queries for overspeed and harsh braking detection.
Updated -
The SalesStream dashboards is an application for monitoring and analyzing revenue data in real time. By leveraging the power of Apache Spark and Apache Kafka, this system ensures that financial data is processed efficiently and in a timely manner, providing companies with up-to-date insights into their revenue streams.
Updated -
Execute Hadoop and Spark applications on the BigData@Polito cluster with a single command
Updated -
Stack Exchange releases "data dumps" of all its publicly available content roughly every three months via archive.org.
This project is an example and a framework for building ETL for this data with Apache Spark and Java.
Updated