Taming Big Data with Apache Spark and Python – Hands On!

Created by means of Sundog Training by means of Frank Kane, Frank Kane

What Will I Be told?

  • Body giant knowledge research issues as Spark issues
  • Use Amazon’s Elastic MapReduce provider to run your process on a cluster with Hadoop YARN
  • Set up and run Apache Spark on a desktop laptop or on a cluster
  • Use Spark’s Resilient Allotted Datasets to procedure and analyze huge knowledge units throughout many CPU’s
  • Enforce iterative algorithms similar to breadth-first-search the usage of Spark
  • Use the MLLib gadget studying library to reply to not unusual knowledge mining questions
  • Know how Spark SQL allows you to paintings with structured knowledge
  • Know how Spark Streaming shall we your procedure steady streams of information in actual time
  • Song and troubleshoot huge jobs operating on a cluster
  • Percentage knowledge between nodes on a Spark cluster the usage of broadcast variables and accumulators
  • Know how the GraphX library is helping with community research issues


  • Get entry to to a private laptop. This route makes use of Home windows, however the pattern code will paintings tremendous on Linux as neatly.
  • Some prior programming or scripting enjoy. Python enjoy will lend a hand so much, however you’ll be able to pick out it up as we pass.


New! Up to date for Spark 2.0.0

“Big data” research is a sizzling and extremely treasured talent – and this route will train you the freshest era in giant knowledge: Apache Spark. Employers together with Amazon, EBay, NASA JPL, and Yahoo all use Spark to briefly extract that means from huge knowledge units throughout a fault-tolerant Hadoop cluster. You’ll be told those self same tactics, the usage of your personal Home windows gadget proper at house. It’s more straightforward than you may suppose.

Be told and grasp the artwork of framing knowledge research issues as Spark issues thru over 15 hands-on examples, and then scale them as much as run on cloud computing services and products on this route. You’ll be studying from an ex-engineer and senior supervisor from Amazon and IMDb.

  • Be told the ideas of Spark’s Resilient Allotted Datastores
  • Increase and run Spark jobs briefly the usage of Python
  • Translate advanced research issues into iterative or multi-stage Spark scripts
  • Scale as much as higher knowledge units the usage of Amazon’s Elastic MapReduce provider
  • Know how Hadoop YARN distributes Spark throughout computing clusters
  • Find out about different Spark applied sciences, like Spark SQL, Spark Streaming, and GraphX

Through the top of this route, you’ll be operating code that analyzes gigabytes value of data – within the cloud – in a question of mins.

This route makes use of the acquainted Python programming language; when you’d relatively use Scala to get the most efficient efficiency out of Spark, see my “Apache Spark with Scala – Hands On with Big Data” route as an alternative.

We’ll have some amusing alongside the way in which. You’ll get warmed up with some easy examples of the usage of Spark to research film scores knowledge and textual content in a e-book. While you’ve were given the fundamentals underneath your belt, we’ll transfer to a few extra advanced and fascinating duties. We’ll use one million film scores to seek out films which can be very similar to every different, and you may even uncover some new films you may like within the procedure! We’ll analyze a social graph of superheroes, and be told who essentially the most “popular” superhero is – and broaden a gadget to seek out “degrees of separation” between superheroes. Are all Wonder superheroes inside a couple of levels of being hooked up to The Improbable Hulk? You’ll uncover the answer.

This route may be very hands-on; you’ll spend maximum of your time following alongside with the teacher as we write, analyze, and run actual code in combination – each by yourself gadget, and within the cloud the usage of Amazon’s Elastic MapReduce provider. 5 hours of video content material is incorporated, with over 15 actual examples of accelerating complexity you’ll be able to construct, run and find out about your self. Transfer thru them at your personal tempo, by yourself time table. The route wraps up with an outline of alternative Spark-based applied sciences, together with Spark SQL, Spark Streaming, and GraphX.

Benefit from the route!

Who’s the objective target audience?

  • Other folks with some device construction background who need to be told the freshest era in giant knowledge research will need to test this out. This route specializes in Spark from a device construction viewpoint; we introduce some gadget studying and knowledge mining ideas alongside the way in which, however that’s no longer the focal point. If you wish to discover ways to use Spark to carve up massive datasets and extract that means from them, then this route is for you.
  • For those who’ve by no means written a pc program or a script ahead of, this route isn’t for you – but. I counsel beginning with a Python route first, if programming is new to you.
  • In case your device construction process comes to, or will contain, processing huge quantities of information, you wish to have to find out about Spark.
  • For those who’re coaching for a brand new occupation in knowledge science or giant knowledge, Spark is the most important a part of it.

Size: 1.41G

Content material retrieved from: https://www.udemy.com/taming-big-data-with-apache-spark-hands-on/.


  1. mmm March 13, 2018 Reply
    • ehubAuthor March 15, 2018 Reply
  2. mmm March 17, 2018 Reply
    • ehubAuthor March 17, 2018 Reply
      • mmm March 18, 2018 Reply

Add a Comment

Your email address will not be published. Required fields are marked *