Implementing multiprocessing.pool.ThreadPool from Python in Rust

In this post, we will implement multiprocessing.pool.ThreadPool from Python in Rust. It represents a thread-oriented version of multiprocessing.Pool, which offers a convenient means of parallelizing the execution of a function across multiple input values by distributing the input data across processes. We will use an existing thread-pool implementation and focus on adjusting its interface to match that of multiprocessing.pool.ThreadPool.
Read More

Ensuring That a Linux Program Is Running at Most Once by Using Abstract Sockets

It is often useful to have a way of ensuring that a program is running at most once (e.g. a system daemon or Cron job). Unfortunately, most commonly used solutions are not without problems. In this post, I show a simple, reliable, Linux-only solution that utilizes Unix domain sockets and the abstract socket namespace. The post includes a sample implementation in the Rust programming language.
Read More

Implementing DBSCAN from Distance Matrix in Rust

We will implement the DBSCAN clustering algorithm in Rust. As its input, the algorithm will take a distance matrix rather than a set of points or feature vectors. This will make the implemented algorithm useful in situations when the dataset is not formed by points or when features cannot be easily extracted. The complete source code, including comments and tests, is available on GitHub.
Read More

Computing Context Triggered Piecewise Hashes in Rust

This post introduces my Rust wrapper for ssdeep by Jesse Kornblum, which is a C library and program for computing context triggered piecewise hashes (CTPH). Also called fuzzy hashes, CTPH can match inputs that have homologies. Such inputs have sequences of identical bytes in the same order, although bytes in between these sequences may be different in both content and length. In contrast to standard hashing algorithms, CTPH can be used to identify files that are highly similar but not identical.
Read More