LibHashSet/README.md

462 lines
14 KiB
Markdown
Raw Normal View History

2022-11-23 13:52:24 +01:00
Introduction
============
**LibHashSet** is a simple *hash set* implementation for C99. It uses open addressing and double hashing.
At this time, the *only* types of elements supported are `uint32_t` and `uint64_t`.
2022-11-23 13:52:24 +01:00
This hash set implementation has been tested to *efficiently* handle several billions of items 😏
Getting Started
===============
Here is a simple example of how to use LibHashSet in your application:
```C
#include <hash_set.h>
#include <stdio.h>
int main(void)
2022-11-23 13:52:24 +01:00
{
uint64_t value;
2022-11-25 16:32:58 +01:00
uintptr_t cursor = 0U;
2022-11-23 13:52:24 +01:00
/* create new hash set instance */
hash_set64_t* const hash_set = hash_set_create64(0U, -1.0);
2022-11-23 13:52:24 +01:00
if (!hash_set)
{
fputs("Allocation has failed!\n", stderr);
return EXIT_FAILURE;
}
/* add a number of items to the hash set, the set will grow as needed */
puts("Insertign items, please wait...");
2022-11-23 13:52:24 +01:00
while (have_more_items())
{
const errno_t error = hash_set_insert64(hash_set, get_next_item());
2022-11-23 13:52:24 +01:00
if (error)
{
fprintf(stderr, "Insert operation has failed! (error: %d)\n", error);
return EXIT_FAILURE;
}
}
puts("Done.\n");
2022-11-23 13:52:24 +01:00
/* print total number of items in the hash set*/
printf("Total number of items: %zu\n\n", hash_set_size64(hash_set));
2022-11-23 13:52:24 +01:00
/* print all items in the set */
while (hash_set_iterate64(hash_set, &cursor, &value) == 0)
2022-11-23 13:52:24 +01:00
{
printf("Item: %016llX\n", value);
}
/* destroy the hash set, when it is no longer needed! */
hash_set_destroy64(hash_set);
2022-11-23 13:52:24 +01:00
return EXIT_SUCCESS;
}
```
2022-11-23 20:36:04 +01:00
API Reference
=============
2022-11-25 01:01:20 +01:00
This section describes the LibHashSet programming interface, as declared in the `<hash_set.h>` header file.
2022-11-27 20:31:12 +01:00
LibHashSet supports sets containing values of type `uint32_t` or `uint64_t`. For each value type, separate functions are provided. The functions for `uint32_t`- and `uint64_t`-based hash sets can be distinguished by the suffix `…32` and `…64`, respectively. In the following, the functions are described in their "generic" (`value_t`) form.
2022-11-25 01:01:20 +01:00
***Note:*** On Microsoft Windows, when using LibHashSet as a "shared" library (DLL), the macro `HASHSET_DLL` must be defined *before* including `<hash_set.h>`! This is **not** required or allowed when using the "static" library.
2022-11-24 18:32:05 +01:00
2022-11-23 20:36:04 +01:00
Types
-----
### hash_set_t
2022-11-24 18:32:05 +01:00
A `struct` that represents a hash set instance. Instances can be allocated and de-allocated via the [hash_set_create()](#hash_set_create) and [hash_set_destroy()](#hash_set_destroy) functions, respectively.
2022-11-23 20:36:04 +01:00
***Note:*** Application code shall treat this `struct` as opaque!
```C
typedef struct _hash_set hash_set_t;
```
2022-11-25 17:56:31 +01:00
Globals
-------
### Version information
The *major*, *minor* and *patch* version of the LibHashSet library:
```C
extern const uint16_t HASHSET_VERSION_MAJOR;
extern const uint16_t HASHSET_VERSION_MINOR;
extern const uint16_t HASHSET_VERSION_PATCH;
```
### Build information
The build *date* and *time* of the LibHashSet library:
```C
extern const char *const HASHSET_BUILD_DATE;
extern const char *const HASHSET_BUILD_TIME;
```
2022-11-23 20:36:04 +01:00
Functions
---------
### hash_set_create()
Allocates a new hash set instance. The new hash set instance is empty initially.
```C
hash_set_t *hash_set_create(
const size_t initial_capacity,
2022-11-24 15:45:21 +01:00
const double load_factor
2022-11-23 20:36:04 +01:00
);
```
#### Parameters
* `initial_capacity`
2022-11-24 18:32:05 +01:00
The initial capacity of the hash set (number of values). The given value will be rounded to the next power of two. If the number of values (keys) to be inserted into the hash set can be estimated beforehand, then the initial capacity should be adjusted accordingly to avoid unnecessary re-allocations. In any case, the hash set will be able to grow dynamically as needed. If this parameter is set to *zero*, the the *default* initial capacity (8192) is used.
2022-11-23 20:36:04 +01:00
* `load_factor`
2022-11-24 18:32:05 +01:00
The load factor to be applied to the hash set. The given value will be clipped to the **0.1** to **1.0** range. Generally, the default load factor (0.8) offers a good trade-off between performance and memory usage. Higher values decrease the memory overhead, but may increase the time required for insert/lookup operations when the hash set is almost completely filled. If this parameter is less than or equal to *zero*, the *default* load factor is used.
2022-11-23 20:36:04 +01:00
#### Return value
On success, this function returns a pointer to a new hash set instance. On error, a `NULL` pointer is returned.
2022-11-24 18:32:05 +01:00
***Note:*** To avoid a memory leak, the returned pointer must be de-allocated by the application using the [hash_set_destroy()](#hash_set_destroy) function, as soon as the instance is *not* needed anymore!
2022-11-23 20:36:04 +01:00
### hash_set_destroy()
2022-11-24 15:45:21 +01:00
De-allocates an existing hash set instance. All items in the hash set are discarded.
2022-11-23 20:36:04 +01:00
```C
void hash_set_destroy(
hash_set_t *instance
);
```
#### Parameters
* `instance`
2022-11-24 18:32:05 +01:00
A pointer to the hash set instance that is to be destroyed, as returned by the [hash_set_create()](#hash_set_create) function.
2022-11-23 20:36:04 +01:00
***Note:*** The given pointer is *invalidated* by this function, and it **must not** be used afterwards!
### hash_set_insert()
Tries to insert the given value into the hash set. The operation fails, if the set already contains the given value.
2022-11-24 15:45:21 +01:00
***Note:*** If the value is actually inserted, then the hash set *may* need to grow.
2022-11-23 20:36:04 +01:00
```C
errno_t hash_set_insert(
hash_set_t *const instance,
2022-11-27 19:50:06 +01:00
const value_t value
2022-11-23 20:36:04 +01:00
);
```
2022-11-24 15:45:21 +01:00
2022-11-23 20:36:04 +01:00
#### Parameters
* `instance`
2022-11-24 18:32:05 +01:00
A pointer to the hash set instance to be modified, as returned by the [hash_set_create()](#hash_set_create) function.
2022-11-23 20:36:04 +01:00
* `value`
The value (key) to be inserted into the hash set.
2022-11-23 20:36:04 +01:00
#### Return value
On success, this function returns *zero*. On error, the appropriate error code is returned. Possible error codes include:
* `EINVAL`
An invalid argument was given, e.g. `instance` was set to `NULL`.
* `EEXIST`
2022-11-24 15:45:21 +01:00
The given value (key) was *not* inserted into the hash set (again), because that value was already present.
2022-11-23 20:36:04 +01:00
* `ENOMEM`
The value could *not* be inserted, because the required amount of memory could *not* be allocated.
* `EFAULT`
2022-11-24 15:45:21 +01:00
Something else went wrong. This usually indicates an internal error and is *not* supposed to happen.
2022-11-23 20:36:04 +01:00
### hash_set_remove()
2022-11-24 15:45:21 +01:00
Tries to remove the given value from the hash set. The operation fails, if the set does *not* contain the given value.
***Note:*** If the value is actually removed, then the hash set *may* shrink.
2022-11-23 20:36:04 +01:00
```C
errno_t hash_set_remove(
hash_set_t *const instance,
2022-11-27 19:50:06 +01:00
const value_t value
2022-11-23 20:36:04 +01:00
);
```
2022-11-24 15:45:21 +01:00
#### Parameters
* `instance`
2022-11-24 18:32:05 +01:00
A pointer to the hash set instance to be modified, as returned by the [hash_set_create()](#hash_set_create) function.
2022-11-24 15:45:21 +01:00
* `value`
The value (key) to be removed from the hash set.
2022-11-24 15:45:21 +01:00
#### Return value
On success, this function returns *zero*. On error, the appropriate error code is returned. Possible error codes include:
* `EINVAL`
An invalid argument was given, e.g. `instance` was set to `NULL`.
* `ENOENT`
The given value (key) could *not* be removed from the hash set, because *no* such value was present.
* `EFAULT`
Something else went wrong. This usually indicates an internal error and is *not* supposed to happen.
2022-11-23 20:36:04 +01:00
### hash_set_clear()
2022-11-24 15:45:21 +01:00
Discards *all* items from the hash set at once.
2022-11-23 20:36:04 +01:00
```C
errno_t hash_set_clear(
hash_set_t *const instance
);
```
2022-11-24 15:45:21 +01:00
#### Parameters
2022-11-23 20:36:04 +01:00
2022-11-24 15:45:21 +01:00
* `instance`
2022-11-24 18:32:05 +01:00
A pointer to the hash set instance to be modified, as returned by the [hash_set_create()](#hash_set_create) function.
2022-11-23 20:36:04 +01:00
2022-11-24 15:45:21 +01:00
#### Return value
On success, this function returns *zero*. On error, the appropriate error code is returned. Possible error codes include:
* `EINVAL`
An invalid argument was given, e.g. `instance` was set to `NULL`.
* `EAGAIN`
The hash set was *not* cleared, because it already was empty. Please try again later!
* `EFAULT`
Something else went wrong. This usually indicates an internal error and is *not* supposed to happen.
2022-11-23 20:36:04 +01:00
### hash_set_contains()
2022-11-24 17:00:19 +01:00
Tests whether the hash set contains a value. The operation fails, if the set does *not* contain the given value.
2022-11-24 15:45:21 +01:00
2022-11-23 20:36:04 +01:00
```C
errno_t hash_set_contains(
const hash_set_t *const instance,
2022-11-27 19:50:06 +01:00
const value_t value
2022-11-23 20:36:04 +01:00
);
```
2022-11-24 15:45:21 +01:00
#### Parameters
* `instance`
2022-11-24 18:32:05 +01:00
A pointer to the hash set instance to be examined, as returned by the [hash_set_create()](#hash_set_create) function.
2022-11-24 15:45:21 +01:00
* `value`
The value (key) to be searched in the hash set.
2022-11-24 15:45:21 +01:00
#### Return value
On success, this function returns *zero*. On error, the appropriate error code is returned. Possible error codes include:
* `EINVAL`
An invalid argument was given, e.g. `instance` was set to `NULL`.
* `ENOENT`
The hash set does *not* contain the specified value (key).
* `EFAULT`
Something else went wrong. This usually indicates an internal error and is *not* supposed to happen.
2022-11-23 20:36:04 +01:00
### hash_set_iterate()
2022-11-24 18:32:05 +01:00
Iterates through the values stored in the hash set. The elements are iterated in **no** particular order.
This function returns one value at a time. It should be called repeatedly, until the end of the set is encountered.
***Warning:*** The result is undefined, if the set is modified while the iteration is in progress!
2022-11-23 20:36:04 +01:00
```C
errno_t hash_set_iterate(
const hash_set_t *const instance,
2022-11-25 16:32:58 +01:00
uintptr_t *const cursor,
2022-11-27 19:50:06 +01:00
value_t *const value
2022-11-23 20:36:04 +01:00
);
```
2022-11-24 18:32:05 +01:00
#### Parameters
* `instance`
A pointer to the hash set instance to be examined, as returned by the [hash_set_create()](#hash_set_create) function.
2022-11-25 16:32:58 +01:00
* `cursor`
A pointer to a variable of type `uintptr_t` where the current iterator state (position) is saved.
This variable **must** be initialized to the value `0U`, by the calling application, prior to the the *first* invocation!
Each invocation will update the value of `*cursor`; the value **shall not** be altered by the application.
2022-11-24 18:32:05 +01:00
* `value`
A pointer to a variable of type `uint32_t` or `uint64_t` where the next value in the set is stored on success.
2022-11-24 18:32:05 +01:00
The content of the variable should be considered *undefined*, if the invocation has failed.
#### Return value
On success, this function returns *zero*. On error, the appropriate error code is returned. Possible error codes include:
* `EINVAL`
An invalid argument was given, e.g. `instance` was set to `NULL`.
* `ENOENT`
No more values. The end of the set has been encountered.
* `EFAULT`
Something else went wrong. This usually indicates an internal error and is *not* supposed to happen.
2022-11-23 20:36:04 +01:00
### hash_set_size()
2022-11-24 18:32:05 +01:00
Returns the current number of values in the hash set.
2022-11-23 20:36:04 +01:00
```C
size_t hash_set_size(
const hash_set_t *const instance
);
```
2022-11-24 18:32:05 +01:00
#### Parameters
* `instance`
A pointer to the hash set instance to be examined, as returned by the [hash_set_create()](#hash_set_create) function.
#### Return value
This function returns the number of values in the hash set.
2022-11-23 20:36:04 +01:00
### hash_set_info()
2022-11-24 18:32:05 +01:00
Returns technical information about the hash set.
2022-11-23 20:36:04 +01:00
```C
errno_t hash_set_info(
const hash_set_t *const instance,
size_t *const capacity,
size_t *const valid,
size_t *const deleted,
size_t *const limit
);
```
2022-11-24 18:32:05 +01:00
#### Parameters
* `instance`
A pointer to the hash set instance to be examined, as returned by the [hash_set_create()](#hash_set_create) function.
* `capacity`
A pointer to a variable of type `size_t` where the current total *capacity* of the hash set is stored.
This value will always be greater than or equal to the sum of the *valid* and *deleted* entries.
* `valid`
A pointer to a variable of type `size_t` where the current number of *valid* entries in the hash set is stored.
This value is equivalent to the return value of the [hash_set_size()](#hash_set_size) function.
* `deleted`
A pointer to a variable of type `size_t` where the current number of *deleted* entries in the hash set is stored.
For technical reasons, entires are *not* removed from the set immediately, but are marked as "deleted".
* `limit`
A pointer to a variable of type `size_t` where the current "grow" *limit* of the hash set is stored.
The hash set is grown automatically, as soon as the sum of the *valid* and *deleted* entries exceeds this limit.
#### Return value
On success, this function returns *zero*. On error, the appropriate error code is returned. Possible error codes include:
* `EINVAL`
An invalid argument was given, e.g. `instance` was set to `NULL`.
* `EFAULT`
Something else went wrong. This usually indicates an internal error and is *not* supposed to happen.
### hash_set_dump()
Dump the current status and content of all "slots" of the hash set.
```C
errno_t hash_set_dump(
const hash_set_t *const instance,
2022-11-27 20:31:12 +01:00
int (*callback)(const size_t index, const char status, const value_t value)
);
```
#### Parameters
* `instance`
A pointer to the hash set instance to be examined, as returned by the [hash_set_create()](#hash_set_create) function.
2022-11-29 17:35:03 +01:00
* `callback`
A pointer to the callback function that will be invoked once for every "slot" in the hash set.
```C
2022-11-29 17:35:03 +01:00
int callback(
const size_t index,
const char status,
const value_t value
);
```
##### Parameters
* `index`
The index of the current "slot" within the hash set.
* `status`
Indicates the status of the current "slot":
2022-11-27 20:31:12 +01:00
- `'u'` &ndash; the slot is *unused*
- `'v'` &ndash; the slot is *valid*
- `'d'` &ndash; the slot is *deleted*
* `value`
The value that is stored at the current "slot" index.
##### Return value
2022-11-27 20:31:12 +01:00
If the function returns a *non-zero* value, the iteration continues; otherwise it is cancelled.
#### Return value
On success, this function returns *zero*. On error, the appropriate error code is returned. Possible error codes include:
* `EINVAL`
An invalid argument was given, e.g. `instance` was set to `NULL`.
* `ECANCELED`
The operation was cancelled by the calling application.
* `EFAULT`
Something else went wrong. This usually indicates an internal error and is *not* supposed to happen.
2022-11-24 18:32:05 +01:00
Thread Safety
-------------
LibHashSet is ***thread-safe***, in the sense that all public functions operate *exclusively* on the given `hash_set_t` instance; there is **no** implicit shared "global" state. This means that **no** synchronization is required in multi-threaded applications, provided that each `hash_set_t` instance is created and accessed only by a *single* thread.
However, LibHashSet does ***nothing*** to synchronize access to a particular `hash_set_t` instance! Consequently, in situations where the *same* `hash_set_t` instance needs to be shared across *multiple* concurrent threads, the calling application is responsible for serializing all access to the "shared" instance, e.g. by using a [*mutex*](https://pubs.opengroup.org/onlinepubs/007908799/xsh/pthread_mutex_lock.html) lock!
2022-11-23 20:36:04 +01:00
2022-11-23 13:52:24 +01:00
License
=======
This work has been released under the **CC0 1.0 Universal** license.
For details, please refer to:
<https://creativecommons.org/publicdomain/zero/1.0/legalcode>
&marker;