Sync API

To maintain complete backwards compatibility with the original HappyBase and to ease upgrading, this library comes with a synchronous version of the API that is autogenerated from the async API at import time to ensure it doesn’t diverge.

The library can be accessed one of two ways:

  1. Via the aiohappybase.sync subpackage
  2. The happybase.py module (which simply imports everything from 1)

Note

If you have both HappyBase and AIOHappyBase installed in the same environment, HappyBase should be picked when you import happybase (packages always seem to be loaded before modules when they have the same name) but it isn’t advised to have both installed.

To ensure you always get the sync version of AIOHappyBase, it is best to use import aiohappybase.sync as happybase if you wish to use the happybase name. The happybase.py module is really only to smooth the transition.

In the sync version, all async methods have been converted to synchronous equivalents. Here are some examples from the user guide, which are basically just removals of the async/await keywords:

from aiohappybase.sync import Connection

with Connection('somehost') as connection:
    table = connection.create_table('mytable', {
        'cf1': dict(max_versions=10),
        'cf2': dict(max_versions=1, block_cache_enabled=False),
        'cf3': dict(),  # use defaults
    })

    table.put(b'row-key-1', {b'cf:col1': b'value1', b'cf:col2': b'value2'})
    table.put(b'row-key-2', {b'cf:col1': b'value1', b'cf:col2': b'value2'})

    rows = table.rows([b'row-key-1', b'row-key-2'])
    for key, data in rows:
        print(key, data)

Connection

class aiohappybase.sync.Connection(host: str = 'localhost', port: int = 9090, timeout: int = None, autoconnect: bool = True, table_prefix: AnyStr = None, table_prefix_separator: AnyStr = b'_', compat: str = '0.98', transport: str = 'buffered', protocol: str = 'binary', client: str = 'socket', **client_kwargs)

Connection to an HBase Thrift server.

The host and port arguments specify the host name and TCP port of the HBase Thrift server to connect to. If omitted or None, a connection to the default port on localhost is made. If specifed, the timeout argument specifies the socket timeout in milliseconds.

If autoconnect is True the connection is made directly during initialization. Otherwise a context manager should be used (with Connection…) or Connection.open() must be called explicitly before first use. Note that due to limitations in the Python async framework, a RuntimeError will be raised if it is used inside of a running asyncio event loop.

The optional table_prefix and table_prefix_separator arguments specify a prefix and a separator string to be prepended to all table names, e.g. when Connection.table() is invoked. For example, if table_prefix is myproject, all tables will have names like myproject_XYZ.

The optional compat argument sets the compatibility level for this connection. Older HBase versions have slightly different Thrift interfaces, and using the wrong protocol can lead to crashes caused by communication errors, so make sure to use the correct one. This value can be either the string 0.90, 0.92, 0.94, or 0.96 (the default).

The optional transport argument specifies the Thrift transport mode to use. Supported values for this argument are buffered (the default) and framed. Make sure to choose the right one, since otherwise you might see non-obvious connection errors or program hangs when making a connection. HBase versions before 0.94 always use the buffered transport. Starting with HBase 0.94, the Thrift server optionally uses a framed transport, depending on the argument passed to the hbase-daemon.sh start thrift command. The default -threadpool mode uses the buffered transport; the -hsha, -nonblocking, and -threadedselector modes use the framed transport.

The optional protocol argument specifies the Thrift transport protocol to use. Supported values for this argument are binary (the default) and compact. Make sure to choose the right one, since otherwise you might see non-obvious connection errors or program hangs when making a connection. TCompactProtocol is a more compact binary format that is typically more efficient to process as well. TBinaryProtocol is the default protocol that AIOHappyBase uses.

The optional client argument specifies the type of Thrift client to use. Supported values for this argument are socket (the default) and http. Make sure to choose the right one, since otherwise you might see non-obvious connection errors or program hangs when making a connection. To check which client you should use, refer to the hbase.regionserver.thrift.http setting. If it is true use http, otherwise use socket.

New in version v1.4.0: client argument

New in version 0.9: protocol argument

New in version 0.5: timeout argument

New in version 0.4: table_prefix_separator argument

New in version 0.4: support for framed Thrift transports

Parameters:
  • host – The host to connect to
  • port – The port to connect to
  • timeout – The socket timeout in milliseconds (optional)
  • autoconnect – Whether the connection should be opened directly
  • table_prefix – Prefix used to construct table names (optional)
  • table_prefix_separator – Separator used for table_prefix
  • compat – Compatibility mode (optional)
  • transport – Thrift transport mode (optional)
  • protocol – Thrift protocol mode (optional)
  • client – Thrift client mode (optional)
  • client_kwargs – Extra keyword arguments for make_client(). See the ThriftPy2 documentation for more information.
close() → None

Close the underlying client to the HBase instance. This method can be safely called more than once. Note that the client is destroyed after it is closed which will cause errors to occur if it is used again before reopening. The Connection can be reopened by calling open() again.

compact_table(name: AnyStr, major: bool = False) → None

Compact the specified table.

Parameters:
  • name (str) – The table name
  • major (bool) – Whether to perform a major compaction.
create_table(name: AnyStr, families: Dict[str, Dict[str, Any]]) → aiohappybase.sync.table.Table

Create a table.

Parameters:
  • name – The table name
  • families – The name and options for each column family
Returns:

The created table instance

The families argument is a dictionary mapping column family names to a dictionary containing the options for this column family, e.g.

families = {
    'cf1': dict(max_versions=10),
    'cf2': dict(max_versions=1, block_cache_enabled=False),
    'cf3': dict(),  # use defaults
}
connection.create_table('mytable', families)

These options correspond to the ColumnDescriptor structure in the Thrift API, but note that the names should be provided in Python style, not in camel case notation, e.g. time_to_live, not timeToLive. The following options are supported:

  • max_versions (int)
  • compression (str)
  • in_memory (bool)
  • bloom_filter_type (str)
  • bloom_filter_vector_size (int)
  • bloom_filter_nb_hashes (int)
  • block_cache_enabled (bool)
  • time_to_live (int)
delete_table(name: AnyStr, disable: bool = False) → None

Delete the specified table.

New in version 0.5: disable argument

In HBase, a table always needs to be disabled before it can be deleted. If the disable argument is True, this method first disables the table if it wasn’t already and then deletes it.

Parameters:
  • name – The table name
  • disable – Whether to first disable the table if needed
disable_table(name: AnyStr) → None

Disable the specified table.

Parameters:name – The table name
enable_table(name: AnyStr) → None

Enable the specified table.

Parameters:name – The table name
is_table_enabled(name: AnyStr) → None

Return whether the specified table is enabled.

Parameters:name (str) – The table name
Returns:whether the table is enabled
Return type:bool
open() → None

Create and open the underlying client to the HBase instance. This method can safely be called more than once.

table(name: AnyStr, use_prefix: bool = True) → aiohappybase.sync.table.Table

Return a table object.

Returns a happybase.Table instance for the table named name. This does not result in a round-trip to the server, and the table is not checked for existence.

The optional use_prefix argument specifies whether the table prefix (if any) is prepended to the specified name. Set this to False if you want to use a table that resides in another ‘prefix namespace’, e.g. a table from a ‘friendly’ application co-hosted on the same HBase instance. See the table_prefix argument to the Connection constructor for more information.

Parameters:
  • name – the name of the table
  • use_prefix – whether to use the table prefix (if any)
Returns:

Table instance

tables() → List[bytes]

Return a list of table names available in this HBase instance.

If a table_prefix was set for this Connection, only tables that have the specified prefix will be listed.

Returns:The table names

Table

class aiohappybase.sync.Table(name: bytes, connection: Connection)

HBase table abstraction class.

This class cannot be instantiated directly; use Connection.table() instead.

append(row: bytes, data: Dict[bytes, bytes], include_timestamp: bool = False) → Union[Dict[bytes, bytes], Dict[bytes, Tuple[bytes, int]]]

Append data to an existing row.

  • This function is only available when using HBase 0.98 (or up).

The data argument behaves just like it does in put() except that instead of replacing the current values, they are appended to the end. If a specified cell doesn’t exist, then the result is the same as calling put() for that cell.

Parameters:
  • row – the row key
  • data – data to append
  • include_timestamp – include timestamps with the values?
Returns:

Updated cell values like the output of row()

batch(timestamp: int = None, batch_size: int = None, transaction: bool = False, wal: bool = True) → aiohappybase.sync.batch.Batch

Create a new batch operation for this table.

This method returns a new Batch instance that can be used for mass data manipulation. The timestamp argument applies to all puts and deletes on the batch.

If given, the batch_size argument specifies the maximum batch size after which the batch should send the mutations to the server. By default this is unbounded.

The transaction argument specifies whether the returned Batch instance should act in a transaction-like manner when used as context manager in a with block of code. The transaction flag cannot be used in combination with batch_size.

The wal argument determines whether mutations should be written to the HBase Write Ahead Log (WAL). This flag can only be used with recent HBase versions. If specified, it provides a default for all the put and delete operations on this batch. This default value can be overridden for individual operations using the wal argument to Batch.put() and Batch.delete().

New in version 0.7: wal argument

Parameters:
  • transaction – whether this batch should behave like a transaction (only useful when used as a context manager)
  • batch_size – batch size (optional)
  • timestamp – timestamp (optional)
  • wal – whether to write to the WAL (optional)
Returns:

Batch instance

cells(row: bytes, column: bytes, versions: int = None, timestamp: int = None, include_timestamp: bool = False) → Union[List[bytes], List[Tuple[bytes, int]]]

Retrieve multiple versions of a single cell from the table.

This method retrieves multiple versions of a cell (if any).

The versions argument defines how many cell versions to retrieve at most.

The timestamp and include_timestamp arguments behave exactly the same as for row().

Parameters:
  • row – the row key
  • column – the column name
  • versions – the maximum number of versions to retrieve
  • timestamp – timestamp (optional)
  • include_timestamp – whether timestamps are returned
Returns:

cell values

column_family_names() → List[bytes]

Retrieve the column family names for this table

counter_dec(row: bytes, column: bytes, value: int = 1) → int

Atomically decrement (or increments) a counter column.

This method is a shortcut for calling Table.counter_inc() with the value negated.

Returns:counter value after decrementing
counter_get(row: bytes, column: bytes) → int

Retrieve the current value of a counter column.

This method retrieves the current value of a counter column. If the counter column does not exist, this function initialises it to 0.

Note that application code should never store a incremented or decremented counter value directly; use the atomic Table.counter_inc() and Table.counter_dec() methods for that.

Parameters:
  • row – the row key
  • column – the column name
Returns:

counter value

counter_inc(row: bytes, column: bytes, value: int = 1) → int

Atomically increment (or decrements) a counter column.

This method atomically increments or decrements a counter column in the row specified by row. The value argument specifies how much the counter should be incremented (for positive values) or decremented (for negative values). If the counter column did not exist, it is automatically initialised to 0 before incrementing it.

Parameters:
  • row – the row key
  • column – the column name
  • value – the amount to increment or decrement by (optional)
Returns:

counter value after incrementing

counter_set(row: bytes, column: bytes, value: int = 0) → None

Set a counter column to a specific value.

This method stores a 64-bit signed integer value in the specified column.

Note that application code should never store a incremented or decremented counter value directly; use the atomic Table.counter_inc() and Table.counter_dec() methods for that.

Parameters:
  • row – the row key
  • column – the column name
  • value – the counter value to set
delete(row: bytes, columns: Iterable[bytes] = None, timestamp: int = None, wal: bool = True) → None

Delete data from the table.

This method deletes all columns for the row specified by row, or only some columns if the columns argument is specified.

Note that, in many situations, batch() is a more appropriate method to manipulate data.

New in version 0.7: wal argument

Parameters:
  • row – the row key
  • columns – list of columns (optional)
  • timestamp – timestamp (optional)
  • wal – whether to write to the WAL (optional)
families() → Dict[bytes, Dict[str, Any]]

Retrieve the column families for this table.

Returns:Mapping from column family name to settings dict
put(row: bytes, data: Dict[bytes, bytes], timestamp: int = None, wal: bool = True) → None

Store data in the table.

This method stores the data in the data argument for the row specified by row. The data argument is dictionary that maps columns to values. Column names must include a family and qualifier part, e.g. b'cf:col', though the qualifier part may be the empty string, e.g. b'cf:'.

Note that, in many situations, batch() is a more appropriate method to manipulate data.

New in version 0.7: wal argument

Parameters:
  • row – the row key
  • data – the data to store
  • timestamp – timestamp (optional)
  • wal – whether to write to the WAL (optional)
regions() → List[Dict[str, Any]]

Retrieve the regions for this table.

Returns:regions for this table
row(row: bytes, columns: Iterable[bytes] = None, timestamp: int = None, include_timestamp: bool = False) → Union[Dict[bytes, bytes], Dict[bytes, Tuple[bytes, int]]]

Retrieve a single row of data.

This method retrieves the row with the row key specified in the row argument and returns the columns and values for this row as a dictionary.

The row argument is the row key of the row. If the columns argument is specified, only the values for these columns will be returned instead of all available columns. The columns argument should be a list or tuple containing byte strings. Each name can be a column family, such as b'cf1' or b'cf1:' (the trailing colon is not required), or a column family with a qualifier, such as b'cf1:col1'.

If specified, the timestamp argument specifies the maximum version that results may have. The include_timestamp argument specifies whether cells are returned as single values or as (value, timestamp) tuples.

Parameters:
  • row – the row key
  • columns – list of columns (optional)
  • timestamp – timestamp (optional)
  • include_timestamp – whether timestamps are returned
Returns:

Mapping of columns (both qualifier and family) to values

rows(rows: List[bytes], columns: Iterable[bytes] = None, timestamp: int = None, include_timestamp: bool = False) → List[Tuple[bytes, Union[Dict[bytes, bytes], Dict[bytes, Tuple[bytes, int]]]]]

Retrieve multiple rows of data.

This method retrieves the rows with the row keys specified in the rows argument, which should be should be a list (or tuple) of row keys. The return value is a list of (row_key, row_dict) tuples.

The columns, timestamp and include_timestamp arguments behave exactly the same as for row().

Parameters:
  • rows – list of row keys
  • columns – list of columns (optional)
  • timestamp – timestamp (optional)
  • include_timestamp – whether timestamps are returned
Returns:

List of mappings (columns to values)

scan(row_start: bytes = None, row_stop: bytes = None, row_prefix: bytes = None, columns: Iterable[bytes] = None, filter: bytes = None, timestamp: int = None, include_timestamp: bool = False, batch_size: int = 1000, scan_batching: int = None, limit: int = None, sorted_columns: bool = False, reverse: bool = False) → AsyncGenerator[Tuple[bytes, Dict[bytes, bytes]], None]

Create a scanner for data in the table.

This method returns an iterable that can be used for looping over the matching rows. Scanners can be created in two ways:

  • The row_start and row_stop arguments specify the row keys where the scanner should start and stop. It does not matter whether the table contains any rows with the specified keys: the first row after row_start will be the first result, and the last row before row_stop will be the last result. Note that the start of the range is inclusive, while the end is exclusive.

    Both row_start and row_stop can be None to specify the start and the end of the table respectively. If both are omitted, a full table scan is done. Note that this usually results in severe performance problems.

  • Alternatively, if row_prefix is specified, only rows with row keys matching the prefix will be returned. If given, row_start and row_stop cannot be used.

The columns, timestamp and include_timestamp arguments behave exactly the same as for row().

The filter argument may be a filter string that will be applied at the server by the region servers.

If limit is given, at most limit results will be returned.

The batch_size argument specifies how many results should be retrieved per batch when retrieving results from the scanner. Only set this to a low value (or even 1) if your data is large, since a low batch size results in added round-trips to the server.

The optional scan_batching is for advanced usage only; it translates to Scan.setBatching() at the Java side (inside the Thrift server). By setting this value rows may be split into partial rows, so result rows may be incomplete, and the number of results returned by the scanner may no longer correspond to the number of rows matched by the scan.

If sorted_columns is True, the columns in the rows returned by this scanner will be retrieved in sorted order, and the data will be stored in OrderedDict instances.

If reverse is True, the scanner will perform the scan in reverse. This means that row_start must be lexicographically after row_stop. Note that the start of the range is inclusive, while the end is exclusive just as in the forward scan.

Compatibility notes:

  • The filter argument is only available when using HBase 0.92 (or up). In HBase 0.90 compatibility mode, specifying a filter raises an exception.
  • The sorted_columns argument is only available when using HBase 0.96 (or up).
  • The reverse argument is only available when using HBase 0.98 (or up).

New in version 1.1.0: reverse argument

New in version 0.8: sorted_columns argument

New in version 0.8: scan_batching argument

Parameters:
  • row_start – the row key to start at (inclusive)
  • row_stop – the row key to stop at (exclusive)
  • row_prefix – a prefix of the row key that must match
  • columns – list of columns (optional)
  • filter – a filter string (optional)
  • timestamp – timestamp (optional)
  • include_timestamp – whether timestamps are returned
  • batch_size – batch size for retrieving results
  • scan_batching – server-side scan batching (optional)
  • limit – max number of rows to return
  • sorted_columns – whether to return sorted columns
  • reverse – whether to perform scan in reverse
Returns:

generator yielding the rows matching the scan

Return type:

iterable of (row_key, row_data) tuples

Batch

class aiohappybase.sync.Batch(table: Table, timestamp: int = None, batch_size: int = None, transaction: bool = False, wal: bool = True)

Batch mutation class.

This class cannot be instantiated directly; use Table.batch() instead.

Initialise a new Batch instance.

close() → None

Finalize the batch and make sure all tasks are completed.

counter_dec(row: bytes, column: bytes, value: int = 1) → None

Atomically decrement (or increments) a counter column.

See Table.counter_dec() for parameter details. Note that this method cannot return the current value because the change is buffered until send to the server.

counter_inc(row: bytes, column: bytes, value: int = 1) → None

Atomically increment (or decrements) a counter column.

See Table.counter_inc() for parameter details. Note that this method cannot return the current value because the change is buffered until send to the server.

delete(row: bytes, columns: Iterable[bytes] = None, wal: bool = None) → None

Delete data from the table.

See Table.put() for a description of the row, data, and wal arguments. The wal argument should normally not be used; its only use is to override the batch-wide value passed to Table.batch().

put(row: bytes, data: Dict[bytes, bytes], wal: bool = None) → None

Store data in the table.

See Table.put() for a description of the row, data, and wal arguments. The wal argument should normally not be used; its only use is to override the batch-wide value passed to Table.batch().

send() → None

Send the batch to the server.

Connection pool

class aiohappybase.sync.ConnectionPool(size: int, **kwargs)

Thread-safe connection pool.

New in version 0.5.

Connection pools work by creating multiple connections and providing one whenever a thread asks. When a thread is done with it, it returns it to the pool to be made available to other threads.

If a thread nests calls to connection(), it will get the same connection back.

The size argument specifies how many connections this pool manages. Additional keyword arguments are passed unmodified to the Connection constructor, with the exception of the autoconnect argument, since maintaining connections is the task of the pool.

Parameters:
  • size (int) – the maximum number of concurrently open connections
  • kwargs – keyword arguments for Connection
QUEUE_TYPE

alias of queue.LifoQueue

close()

Clean up all pool connections and delete the queue.

connection(timeout: numbers.Real = None) → aiohappybase.sync.connection.Connection

Obtain a connection from the pool.

This method must be used as a context manager, i.e. with Python’s with block. Example:

with pool.connection() as connection:
    pass  # do something with the connection

If timeout is specified, this is the number of seconds to wait for a connection to become available before NoConnectionsAvailable is raised. If omitted, this method waits forever for a connection to become available.

Parameters:timeout – number of seconds to wait (optional)
Returns:active connection from the pool
class aiohappybase.sync.NoConnectionsAvailable

Exception raised when no connections are available.

This happens if a timeout was specified when obtaining a connection, and no connection became available within the specified timeout.

New in version 0.5.

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.