"Big Data" technologies

 Big Data programming model and technologies



Tools:

Anaconda -
  open source (free).
  Most popular at 33.4% in 2018.
  Has R and Python versions.
  IDEs: Jupyter, RStudio, Spyder, and JupyterLab  
  Editors: Jupyter, RStudio, Spyder, and Visual Studio Code
  Platforms: Linux, macOS, Windows
  Visualize your data: Matplotlib, Bokeh, Datashader, and Holoviews
  Machine learning & deep learning models: Scikit-learn, Tensorflow, h20, and Theano
  Analyze: Dask, numpy, pandas, and Numba

Keras wrapper on top of Tensorflow.

================================================================

R

48.5% in 2018.

R is "a language for statisticians built by statisticians." open source (free).

ggplot2
SparkR bindings to run Spark on R.

Disad -
  1) hard to be productive in R (if no prior Matlab, SAS, or OCTAVE).
  2) Limited at more general purposes.

================================================================

Python

open source (free).
Python is still the leader: 65.6% in 2018.

Advantages:
1) Interpreted language. Hence, the coded program does not need any compilation.
2) Dynamically defines variable types.
3) Unique in its way with less coding which makes it more acceptable for the users.
4) Strongly typed which needs manual typecasting.
5) Portable, extendable, and scalable.

Good at Natural Language Processing (NLP) such as classic NTLK, topic modeling with GenSim, or
the blazing-fast and accurate spaCy. Good at neural networking, with Theano (4.9% in 2018) and
Tensorflow (29.9% in 2018); scikit-learn (24.4% in 2018) for machine learning,
and NumPy and Pandas for data analysis.

Juypter/iPython are Web-based notebook server that allows you to mix code, plots,
and anything in a shareable logbook format.

One of Python's killer concept is Read-Evaluate-Print-Loop (REPL), spread across almost
all languages including both Scala and R.

As opposed to R, Python is a traditional object-oriented language.

Disads: 1) Requires correct white-spacing in your code.
        2) Must wait for features unlike Scala or Java.

================================================================

Scala (Hybrid)

Advantages:
1) General-purpose language with a concise and expressive design. Hence, it is less verbose.
2) Supports both OOP and functional programming in individual ways.
3) Interoperable with Java libraries.
4) Portable. One can write Scala s source code and then can run it on JVM as compiled Java bytecode.
5) Compile to JavaScript. Hence, you can use Scala to write web apps.
6) Checks types at compile time. Hence, developers can catch the bugs at compile time and can
escape many production issues.

Scala is based on Java and compiled code runs on the Java Virtual Machine (JVM) platform.
Scala is marriage of the functional and object-oriented paradigms. Popular in the financial world
and companies with very large amounts of data, often in a massively distributed fashion (such as
Twitter and LinkedIn). Language that drives both Spark and Kafka.

As it runs in the JVM, it immediately gets access to the Java ecosystem for free, but it also has a
wide variety of "native" libraries for handling data at scale (in particular Twitter's Algebird and
Summingbird). Includes a very handy REPL for interactive development like Python and R.

Disads: 1) Confusing syntax. 2) slow compiler

================================================================

HiveQL

HiveQL is a query-based language for coding instructions to Apache Hive. Works on top of Apache
Hadoop or other distributed storage platforms such as Amazon s S3 file system. Based on SQL.

================================================================

Pig Latin

Hadoop-oriented, open source system.  Pig Latin is the language layer of the Apache Pig platform,
which is used to create Hadoop MapReduce jobs which sort and apply mathematical functions to large,
distributed datasets.

================================================================

SQL  (39.6% in 2018)

SQL, including Spark SQL, and SQL to Hadoop tools

Comments

Popular posts from this blog

Upgrading to .NET8 from desktop versions 4.8.X

GHL Chat Bots for Webpage

GHL > Set website so shorter URL address