By Padma Priya Chitturi
- Use Apache Spark for info processing with those hands-on recipes
- Implement end-to-end, large-scale info research greater than ever before
- Work with robust libraries comparable to MLLib, SciPy, NumPy, and Pandas to achieve insights out of your data
Spark has emerged because the so much promising massive facts analytics engine for info technological know-how pros. the real strength and price of Apache Spark lies in its skill to execute info technology initiatives with pace and accuracy. Spark's promoting aspect is that it combines ETL, batch analytics, real-time flow research, desktop studying, graph processing, and visualizations. It enables you to take on the complexities that include uncooked unstructured information units with ease.
This advisor gets you cozy and assured appearing facts technological know-how initiatives with Spark. you are going to know about implementations together with dispensed deep studying, numerical computing, and scalable computing device studying. you may be proven potent recommendations to challenging techniques in information technology utilizing Spark's information technology libraries equivalent to MLLib, Pandas, NumPy, SciPy, and extra. those easy and effective recipes will help you enforce algorithms and optimize your work.
What you are going to learn
- Explore the subjects of knowledge mining, textual content mining, normal Language Processing, info retrieval, and laptop learning.
- Solve real-world analytical issues of huge facts sets.
- Address information technology demanding situations with analytical instruments on a allotted process like Spark (apt for iterative algorithms), which deals in-memory processing and extra flexibility for facts research at scale.
- Get hands-on adventure with algorithms like type, regression, and suggestion on actual datasets utilizing Spark MLLib package.
- Learn approximately numerical and clinical computing utilizing NumPy and SciPy on Spark.
- Use Predictive version Markup Language (PMML) in Spark for statistical information mining models.
About the Author
Padma Priya Chitturi is Analytics Lead at Fractal Analytics Pvt Ltd and has over 5 years of expertise in huge information processing. at present, she is a part of strength improvement at Fractal and liable for answer improvement for analytical difficulties throughout a number of company domain names at huge scale. sooner than this, she labored for an airways product on a real-time processing platform serving 1000000 person requests/sec at Amadeus software program Labs. She has labored on understanding large-scale deep networks (Jeffrey dean's paintings in Google mind) for photo type at the huge facts platform Spark. She works heavily with immense info applied sciences similar to Spark, typhoon, Cassandra and Hadoop. She used to be an open resource contributor to Apache Storm.
Table of Contents
- Big info Analytics with Spark
- Tricky records with Spark
- Data research with Spark
- Clustering, category, and Regression
- Working with Spark MLlib
- NLP with Spark
- Working with glowing Water - H2O
- Data Visualization with Spark
- Deep studying on Spark
- Working with SparkR
Read or Download Apache Spark for Data Science Cookbook PDF
Best data modeling & design books
Info caliber: The Accuracy size is ready assessing the standard of company facts and enhancing its accuracy utilizing the knowledge profiling technique. company facts is more and more vital as businesses proceed to discover new how one can use it. Likewise, enhancing the accuracy of information in info platforms is speedy turning into an enormous objective as businesses detect how a lot it impacts their base line.
David Gould's acclaimed first ebook, whole Maya Programming: an in depth consultant to MEL and the C++ API, offers artists and programmers with a deep figuring out of how Maya works and the way it may be superior and customised via programming. In his new e-book David bargains a steady, intuitive advent to the middle principles of special effects.
Designing Sorting Networks: a brand new Paradigm presents an in-depth consultant to maximizing the potency of sorting networks, and makes use of 0/1 situations, partly ordered units and Haase diagrams to heavily learn their habit in a simple, intuitive demeanour. This booklet additionally outlines new rules and methods for designing quicker sorting networks utilizing Sortnet, and illustrates how those recommendations have been used to layout quicker 12-key and 18-key sorting networks via a sequence of case stories.
This Festschrift quantity is released in honor of Professor Paul G. Spirakis at the party of his sixtieth birthday. It celebrates his major contributions to laptop technological know-how as an eminent, proficient, and influential researcher and such a lot visionary idea chief, with a very good expertise in inspiring and guiding younger researchers.
- Scientific Computing with Python 3
- Modeling Reality: How Computers Mirror Life
- Writing and Querying MapReduce Views in CouchDB: Tools for Data Analysts
- Building Data Warehouse
- Model Based Process Control: Proceedings of the IFAC Workshop, Atlanta, Georgia, USA, 13-14 June, 1988 (IFAC Workshop Series)
Additional resources for Apache Spark for Data Science Cookbook
Apache Spark for Data Science Cookbook by Padma Priya Chitturi