Andreas Francois Vermeulen

  • Understand the industrialization of machine learning (ML) and take the first steps toward identifying and generating the transformational disruptors of artificial intelligence (AI). You will learn to apply ML to data lakes in various industries, supplying data professionals with the advanced skills required to handle the future of data engineering and data science. Data lakes currently generated by worldwide industrialized business activities are projected to reach 35 zettabytes (ZB) as the Fourth Industrial Revolution produces an exponential increase of volume, velocity, variety, variability, veracity, visualization, and value. Industrialization of ML evolves from AI and studying pattern recognition against the increasingly unstructured resource stored in data lakes. Industrial Machine Learning supplies advanced, yet practical examples in different industries, including finance, public safety, health care, transportation, manufactory, supply chain, 3D printing, education, research, and data science. The book covers: supervised learning, unsupervised learning, reinforcement learning, evolutionary computing principles, soft robotics disruptors, and hard robotics disruptors.


    What You Will Learn

    Generate and identify transformational disruptors of artificial intelligence (AI)
    Understand the field of machine learning (ML) and apply it to handle big data and process the data lakes in your environment
    Hone the skills required to handle the future of data engineering and data science










    Who This Book Is For
    Intermediate to expert level professionals in the fields of data science, data engineering, machine learning, and data management

  • Learn how to build a data science technology stack and perform good data science with repeatable methods. You will learn how to turn data lakes into business assets.
    The data science technology stack demonstrated in Practical Data Science is built from components in general use in the industry. Data scientist Andreas Vermeulen demonstrates in detail how to build and provision a technology stack to yield repeatable results. He shows you how to apply practical methods to extract actionable business knowledge from data lakes consisting of data from a polyglot of data types and dimensions.
    What You'll LearnBecome fluent in the essential concepts and terminology of data science and data engineering 
    Build and use a technology stack that meets industry criteria
    Master the methods for retrieving actionable business knowledge
    Coordinate the handling of polyglot data types in a data lake for repeatable resultsWho This Book Is For
    Data scientists and data engineers who are required to convert data from a data lake into actionable knowledge for their business, and students who aspire to be data scientists and data engineers

  • Dive into the world of SQL on Hadoop and get the most out of your Hive data warehouses. This book is your go-to resource for using Hive: authors Scott Shaw, Ankur Gupta, David Kjerrumgaard, and Andreas Francois Vermeulen take you through learning HiveQL, the SQL-like language specific to Hive, to analyze, export, and massage the data stored across your Hadoop environment. From deploying Hive on your hardware or virtual machine and setting up its initial configuration to learning how Hive interacts with Hadoop, MapReduce, Tez and other big data technologies, Practical Hive gives you a detailed treatment of the software.In addition, this book discusses the value of open source software, Hive performance tuning, and how to leverage semi-structured and unstructured data. 
    What You Will Learn
    Install and configure Hive for new and existing datasets
    Perform DDL operations
    Execute efficient DML operationsUse tables, partitions, buckets, and user-defined functionsDiscover performance tuning tips and Hive best practices
    Who This Book Is For
    Developers, companies, and professionals who deal with large amounts of data and could use software that can efficiently manage large volumes of input. It is assumed that readers have the ability to work with SQL. 

empty