xalloc.c uses this symbol to provide additional debugging
information. This patch adds support to make to define this symbol
when running:
$ make DEBUG=1
Remove the struct hash_entry and define a new type for hash values.
Most likely we will only need one member in this struct. and instead
of just using unsinged int. let's name a new type for clarity.
Not a huge performance loss anyway. And the directory may be deleted
between calls so. However the directory can still be missing as soon
as the mkdir() call has ended.
Doing it this way just means that if the directory is removed during
execution of the program, it will be created again and not make the
rest of the programs lifetime live without it.
using the pointer 'b->block' when it is possible that
reallocation has moved the memory to another location.
'b->block' may therefore be an invalid pointer in some
cases. use 'ret' intead.
Use the refactored code from hash.c also
use chaining as the collision strategy instead of
open-adressing, not only becouse the new hash api makes it hard
to do but it is more space efficient.
Since a collision with open-adressing results in two entries
in the hash table but with chaining, we only have one.
the complexity for search/insert/delete is still O(n) for both techniques.
Chaining is better because items that collide only takes up one slot in the
hash table, considering that the best-case for space overflow is 25%. it
is better to have a small table.
flush() is redundant, it makes more sense to just write the file on close().
There is no reason why you want to commit the current state of the cache to disk
at any other time then when closing the application.
with SHA1 as a CRC mechanism.
When writing file formats using SHA1 as CRC, its is handy to
have SHA1_Update() to be applied to every write(). so that an
SHA1 hash can be calculated for that data and used as an CRC check.
Therefor this interface is created to wrap the code used to do this.
A new datastructure is about to take dlhist place. dlhist is currently
implemented as a mixture of an "process cache" that should record what
rss items has been processed (that is why the url is used as a unique
identifier), but right now it only stores an url if it has been
downloaded. A new datastructure that should be "download history"
shall be implemented, that will keep track of what title and where
it has been downloaded to. this will make it possible to only
download an rss title to a location once.
Splitting this datastructure into two separated structures is trivial
as a "process cache" will threat URL's as a unique identifier and
a "download history" will threat the title in an rss item as a
unique identifier (and also track it's destinations).
This commit does not change any functionality, I just rename
this to keep the "dlhist" prefix and source files clear for
when implementing the real dlhist.
Now that table size can be calculated, lets store the number of entries
instead of size in the header so we can rely on that when reading
entries, instead of the actual size on disk. this is safer if data is
appended to the file outside of the application.
Somehow I apperently missed to do linear probing in he_insert that
results in colliding entries read from file (and when resizing)
to be dropped on the floor.
Lets not drop things on the floor anymore, certainly there is
another place in the table that will do fine instead of just
giving up and throw it on the floor.