Skip to Main Content
Build a Data Pipeline with Big Data Service and Oracle Analytics Cloud

About This Workshop

Youtube Video

About This Workshop
In this workshop, you'll be guided through the essential steps of building a Data Pipeline using Oracle Big Data Service (BDS). The journey will have you wearing multiple hats, each offering a unique perspective on data handling.

First, as a Data Engineer, you'll tackle the task of cleansing and transforming raw data from Oracle Object Storage. Using Spark, a key feature of our Big Data Service, you'll refine this data and store it in a specific format in an Object Storage bucket. This data will then be placed in an external Hive table. The dataset for this exercise is the well-known Taxi Data Set.

Next, you'll step into the shoes of a Data Scientist. Here, you'll work with a Jupyter notebook, also a part of our service. Your focus will be on building and saving a data model.

This workshop is designed to give you a hands-on experience in the various facets of data processing and analysis, preparing you for the challenges of Big Data.

Finally we'll wear the hat of a Business Analyst where we visualize this data using Oracle Analytics Cloud.

Workshop Info

6 hours
  • Lab 1: Set Up the BDS Environment
  • Lab 2: Create a BDS Hadoop Cluster
  • Lab 3: Access a BDS Utility Node Using a Public IP Address
  • Lab 4: Use Ambari and Hue to Access a BDS Cluster
  • Lab 5: Create a Hadoop Administrator User
  • Lab 6: Cleanse data and upload to Hive using Python (PySpark)
  • Lab 7: Run the Machine Learning, Data Science code using Jupyterhub
  • Lab 8: Visualize data using Oracle Analytics Cloud
  • Lab 9: Clean up Resources Used in this Workshop (Optional)
  • Sample - Familiarity with Database is desirable, but not required
  • Some understanding of cloud and database terms is helpful
  • Familiarity with Oracle Cloud Infrastructure (OCI) is helpful

Other Workshops you might like

Ask Oracle
Helping you on LiveLabs