Published: 2021-11-02

Extending Fuzzy Message Detection to Groups

In this post I will delve into the current work I’ve been doing on extending fuzzy message detection to groups.

For a brief overview of my previous work on fuzzy message detection see Discreet Log #1: Anonymity, Bandwidth and Fuzzytags.

Some Considerations for Group Tagging

The obvious problem with groups fuzzy message tagging is that the signal isn’t hidden. As such sending multiple messages to a group will implicate the same members. Over time, this will likely expose the structure of the group to the detection server.

This problem also exists with single-party tagging. See the fuzzytags book for more details.

However, this risk must be balanced by the ability of parties to implicate whoever they want in their tags. I’ve talked about this previously in the fuzzytags book section on Entangling Strategies.

Being able to efficiently tag multiple parties using a single tag opens up the ability to not only allow parties to engage in deniable messaging, but also group messaging and sub group messaging.

Combined with additional metadata elimination schemes like mixnets (see niwl), I believe that a scheme based on fuzzy message detection could provide a very powerful, modern, privacy system.

The Problem with “Entangled” Tags

One of the first extensions I added to fuzzytags was the ability to find “entangled tags”. Entangling is a very expensive process of finding a random scalar z that leads to the generation of the same key for a group of tags.

Even with all the various optimizations I added to fuzzytags over the months, it still takes upwards of 70 seconds to generate an fully entangled tag for 2 parties. Generating a fully entangled tag for 3 parties can take upwards of 20 minutes, and beyond that falls into into the realm of practically impossible.

As such it would be very nice to have a different tag generation mechanism that lends itself to group tagging.

Extending Tags with Hash “Tweaks”

The main bottle of group tag generation is in finding a scalar that leads to the same key generation for a collection of tags. Because we only generate one z per tag, any failure means throwing away all previous work done on the tag.

If instead, we allow the hash to be “tweaked” for each bit of the key, we can avoid throwing away all that work.

There is probably a better term for this that is not “nonce”, without the cryptographic connotations.

This comes with three major considerations:

The length of the tag increases with the size of the tweak. If we permit a full byte for each tweak then the tag size increases by γ bytes.
We need to be careful when generating tweak bytes so that it doesn’t break the overall false positive rate of the scheme for non-group tags.
We need to bind the nonces to the other tag parameters to preserve CCS-security

Given the standard γ = 24 this results in a ~36% increase in the size of the tag. Given that tag size is still measured in bytes, this is a practical increase.

Tweak bytes can always be drawn from a uniform distribution to not bias the hash output for false positive matches. It is worth noting at this point that the nature of fuzzy message detection means that output is always biases for true-positive matches.

i.e. true positives always result in the detection server learning that a match as taken place.

Finally we can simply add the tweaks to the input of the other hash function G which is used to generate the scalar y needed to reconstruct w - any attempt to maul the hash tweaks will result in an incorrect value of w.

A Handwavey “Proof” that Hash Tweaks Preserve False Positive Rates

I would posit that changing the efficiency of deriving entangled tags, does not change the underlying security properties of fuzzy tags.

In the original fuzzy message detection scheme, a key is generated bit-by-bit where each key bit k_i is derived by hashing 2 public parameters (u = g^r and w = g^z) with the corresponding part of the tagging key T_i multiplied by the scalar secret random scalar r:

$$ k_i = H(u | w | {T_i}^r) \vphantom{+ \frac{1}{1}} $$

Given that the tagger has full control over all parameters (except the tagging key itself) and given that the output of the hash function is a single bit. The tagger already has the ability to generate any key that they desire - indeed the ability to generate entangled tags at all is dependent on this being the case.

Extending the scheme with hash tweaks permits the tagger to choose a byte t to append to the input of the hash:

$$ k_i = H(u | w | t | {T_i}^r) \vphantom{+ \frac{1}{1}} $$

Note that this does not change any of the other parameters, but it does give the tagger the ability to determine the output key more efficiently (as the key can now be tweaked bit-by-bit instead of the all-or-nothing approach of the original scheme).

When an attempt to derive the key using a non-target detection key results i.e. u^x_i acts as original tweak to the hash function. Given that x_i is drawn from ℤ_q, the potential input space for x_i is significantly larger than t.

We model H as a random oracle (practically H is chosen to to be first bit of the sha3 hash of the, domain separated, inputs) then we can expect the output key bit to be chosen uniformly random for arbitrary detection keys. The fact that we have carefully chosen t to result in a match for a specific set of parties is irrelevant.

As such, adding a tweak byte that the tagger has full control over, does not alter false positive rates for inc

Performance and Spam Considerations

Generating a tag that definitively matches 5 different tagging keys takes ~7ms on a consumer desktop. Not only do hash tweaked tags have a much lower generation time in general, unlike the exponential increase of entangled tags, with hash tweaks the generation time scales linearly with each additional tagging key.

This makes schemes associated with entangled tags like Deniable Entangling much more practical. Indeed, it is possible to entangle multiple parties efficiently.

Note that is been always possible to generate a tag to an arbitrary party, so the addition of tweaked hashes does not make the possibility of individually targeting pointless messages any more likely - it does however, allow the efficient targeting of multiple individuals with a single tag

This was, of course, always a possibility, even in the original scheme.

Prototype and Next Steps

I’ve pushed a prototype branch of fuzzytags using tweaked hashes, including new benchmarks.

This is a site where I dump essays, ideas, thoughts, math and anything else that doesn’t fit into another format, or isn’t yet ready for a longer paper. Beware: Ideas may be half thought through and/or full of errors. Hic sunt dracones.