What is Apache Hive?

The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. It helps in projecting structure on data already present in a storage.

Data Types supported in Apache Hive

Apache Hive supports following data types:

  • Numeric Types: TINYINT, SMALLINT, INT, BIGINT, FLOAT

  • Date/Time Types: TIMESTAMP, DATE, INTERVAL

  • String Types: STRING, VARCHAR, CHAR

  • Misc Types: BOOLEAN, BINARY

  • Complex Types: arrays, maps, structs, union

Numeric Types in Apache Hive

Numeric Types in Apache Hive are:

  • TINYINT: (1-byte signed integer, from -128 to 127)

  • SMALLINT: (2-byte signed integer, from -32,768 to 32,767)

  • INT/INTEGER: (4-byte signed integer, from -2,147,483,648 to 2,147,483,647)

  • BIGINT: (8-byte signed integer, from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)

Float Types in Apache Hive

Float Types in Apache Hive are:

  • FLOAT (4-byte single precision floating point number)

  • DOUBLE (8-byte double precision floating point number)

  • DOUBLE PRECISION (alias for DOUBLE, only available starting with Hive 2.2.0)

  • DECIMAL Introduced in Hive 0.11.0 with a precision of 38 digits

  • NUMERIC (same as DECIMAL, starting with Hive 3.0.0)

Date/Time Types in Apache Hive

Date/Time Types in Apache Hive are:

  • TIMESTAMP: Only available starting with Hive 0.8.0

  • DATE: Only available starting with Hive 0.12.0

  • INTERVAL: Only available starting with Hive 1.2.0

String Types in Apache Hive

String Types in Apache Hive are:

  • STRING

  • VARCHAR: Only available starting with Hive 0.12.0

  • CHAR: Only available starting with Hive 0.13.0

Misc Types in Apache Hive

Misc Types in Apache Hive are:

  • BOOLEAN

  • BINARY: Only available starting with Hive 0.8.0

Complex Types in Apache Hive

Complex Types in Apache Hive are:

  • arrays: ARRAY<data_type> (Note: negative values and non-constant expressions are allowed as of Hive 0.14.)

  • maps: MAP<primitive_type, data_type> (Note: negative values and non-constant expressions are allowed as of Hive 0.14.)

  • structs: STRUCT<col_name : data_type [COMMENT col_comment], …​>

  • union: UNIONTYPE<data_type, data_type, …​> (Note: Only available starting with Hive 0.7.0.)

Hive Data Definition Language (DDL) Statements

HiveQL DDL statements are as follows:

  • CREATE DATABASE/SCHEMA, TABLE, VIEW, FUNCTION, INDEX

  • DROP DATABASE/SCHEMA, TABLE, VIEW, INDEX

  • TRUNCATE TABLE

  • ALTER DATABASE/SCHEMA, TABLE, VIEW

Hive Data Definition Language (DDL) Statements contd.

HiveQL DDL statements are as follows:

  • MSCK REPAIR TABLE (or ALTER TABLE RECOVER PARTITIONS)

  • SHOW DATABASES/SCHEMAS, TABLES, TBLPROPERTIES, VIEWS, PARTITIONS, FUNCTIONS, INDEX[ES], COLUMNS, CREATE TABLE

  • DESCRIBE DATABASE/SCHEMA, table_name, view_name, materialized_view_name

Hive Data Manipulation Language (DML)

Hive supports following DML statements:

  • LOAD

  • INSERT

  • UPDATE

  • DELETE

  • MERGE

Hive Data Retrieval

A SELECT statement can be part of a union query or a subquery of another query.

  • table_reference indicates the input to the query.

  • It can be a regular table, a view, a join construct or a subquery.

  • Table names and column names are case insensitive.

Hive SELECT query options

Hive supports following in SELECT query:

  • GROUP BY

  • SORT/ORDER/CLUSTER/DISTRIBUTE BY

  • JOIN

  • UNION

  • TABLESAMPLE

  • Subqueries

  • Virtual Columns

Further Sources

Refer official documents on Apache Hive here: