Dean Wampler, Ph.D., is a member of the Office of the CTO and the Architect for Big Data Products and Services at Typesafe. He uses Scala and Functional Programming to build Big Data systems using Spark, Mesos, Hadoop, the Typesafe Reactive Platform, and other tools. Dean is the author or co-author of three O’Reilly books on Scala, Functional Programming, and Hive. He contributes to several open source projects (including Spark) and he co-organizes and speaks at many technology conferences and Chicago-based user groups.
YOW! 2015 Brisbane
Scala and the JVM as a Big Data Platform: Lessons from the Spark Project
TALK – VIEW SLIDES
Apache Spark is implemented in Scala and it’s user-facing Scala API is very similar to Scala’s own collections API. The power and concision of this API are bringing many developers to Scala.
On the other hand, while the JVM is an excellent, general-purpose platform for scalable computing, its management of objects is suboptimal for high-performance data crunching. Hence, the Spark project has recently started a project called ””Tungsten”” to build internal optimizations based on custom data layouts, manual memory management (both on-heap and off-heap), etc.
Using these and other examples from the Spark project, this talk discusses the strengths and weaknesses of Scala and the JVM for Big Data.
Spark Crash Course
This hands-on tutorial introduces you to Apache Spark, the distributed, data processing engine for batch mode and event stream processing, SQL queries, graph processing, and machine learning.