Apache Sqoop efficiently transfers bulk data between Apache Hadoop and structured datastores such as relational databases. Sqoop helps offload certain tasks (such as ETL processing) from the EDW to Hadoop for efficient execution at a much lower cost. Sqoop can also be used to extract data from Hadoop and export it into external structured datastores. Sqoop works with relational databases such as Teradata, Netezza, Oracle, MySQL, Postgres, and HSQLDB
Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases. This is a brief tutorial that explains how to make use of Sqoop in Hadoop ecosystem.
With Sqoop, you can import data from a relational database system or a mainframe into HDFS. The input to the import process is either database table or mainframe datasets. For databases, Sqoop will read the table row-by-row into HDFS.
For mainframe datasets, Sqoop will read records from each mainframe dataset into HDFS. The output of this import process is a set of files containing a copy of the imported table or datasets. The import process is performed in parallel. For this reason, the output will be in multiple files. These files may be delimited text files (for example, with commas or tabs separating each field), or binary Avro or SequenceFiles containing serialized record da
Sqoop includes some other commands which allow you to inspect the database you are working with. For example, you can list the available database schemas (with the sqoop-list-databases tool) and tables within a schema (with the sqoop-list-tables tool). Sqoop also includes a primitive SQL execution shell (the sqoop-eval tool).
Most aspects of the import, code generation, and export processes can be customized. For databases, you can control the specific row range or columns imported. You can specify particular delimiters and escape characters for the file-based representation of the data, as well as the file format used. You can also control the class or package names used in generated code. Subsequent sections of this document explain how to specify these and other arguments to Sqoop.
Sqoop is a collection of related tools. To use Sqoop, you specify the tool you want to use and the arguments that control the tool.
If Sqoop is compiled from its own source, you can run Sqoop without a formal installation process by running the bin/sqoop program.
SINGA is an Apache Incubating project for developing an open source machine learning library. It provides a flexible architecture for scalable distributed training, is extensible to run over a wide range of hardware, and has a focus on health-care applications.
SINGA was initiated by the DB System Group at National University of Singapore in 2014, in collaboration with the database group of Zhejiang University.
SINGA is a general distributed deep learning platform for training big deep learning models over large datasets. It is designed with an intuitive programming model based on the layer abstraction. A variety of popular deep learning models are supported, namely feed-forward models including convolutional neural networks (CNN), energy models like restricted Boltzmann machine (RBM), and recurrent neural networks (RNN). Many built-in layers are provided for users. SINGA architecture is sufficiently flexible to run synchronous, asynchronous and hybrid training frameworks. SINGA also supports different neural net partitioning schemes to parallelize the training of large models, namely partitioning on batch dimension, feature dimension or hybrid partitioning.
The second goal is to make SINGA easy to use. It is non-trivial for programmers to develop and train models with deep and complex model structures. Distributed training further increases the burden of programmers, e.g., data and model partitioning, and network communication. Hence it is essential to provide an easy to use programming model so that users can implement their deep learning models/algorithms without much awareness of the underlying distributed platform.
Apache Velocity is a Java-based template engine that provides a template language to reference objects defined in Java code. Here is a conversation or quarrel between Velocity (Apache) developers and Spring ones revolving around the reason why Velocity is not supported on the Spring framework.
Velocity is a Java-based templating engine.
It’s an open source web framework designed to be used as a view component in the MVC architecture, and it provides an alternative to some existing technologies such as JSP.
Velocity can be used to generate XML files, SQL, PostScript and most other text-based formats.The core class of Velocity is the VelocityEngine.
It orchestrates the whole process of reading, parsing and generating content using data model and velocity template.
Here are the steps we need to follow for any typical velocity application:
Initialize the velocity engine
Read the template
Put the data model in context object
Merge the template with context data and render the view
Velocity Template Language (VTL) provides the simplest and cleanest way of incorporating the dynamic content in a web page by using VTL references.
VTL reference in velocity template starts with a $ and is used for getting the value associated with that reference.
VTL provides also a set of directives which can be used for manipulating the output of the Java code. Those directives start with #.