Published: 2021-08-12

# A Closer Look at Fuzzy Threshold PSI (ftPSI-AD)

Apple recently released a detailed cryptographic paper describing a new private set intersection protocol which they named Fuzzy Threshold PSI with Associated Data, or ftPSI-AS for short.

In my last article I sketched out a probabilistic analysis of their proposed approach under the assumption that the ftPSI-AS was secure. I now want to take a closer look at the actual proposed protocol from a cryptographic perspective. This article will be mostly technical summary from my own notes, with some analysis at the end - you may want to skip to the analysis part.

## A note on Hash Functions

There are at least 3 different sets of hash functions used in this system:

- Mapping the image into the NeuralHash space - a perceptual hash used to detect matching images. Random collisions are very likely to happen.
- Mapping the NeuralHash space into the blinded hash space - a cuckoo hash intended to prevent the client from learning about the actual hashes being compared against. Random collisions are cryptographically unlikely to happen here.
- Mapping the PRF’d image identifier space into the DHF space - intended to allow the server to distinguish real hashes from synthetic matches, as above.

## A Protocol Summary

The server has a set of hash values *X*.

The client has an ordered list of triples (*Y*) containing a hash value, an
identifier and some associated data i.e. an image, a random identifier
and the hash of the image.

Different images (with different identifiers and/or associated data) can have the same NeuralHash. NeuralHash has been engineered such that similar images result in the same hash. It is also likely suspectable to false positives like all perceptual hashes.

again see Obfuscated Apples for more analysis on this part of the system.

The identifiers **are not secret**. The intent is for
the server to learn the entire list of identifiers.

The ultimate goal is to construct a system where the client streams
triples to the server over a period of time, the client, in addition to
some indistinguishable from random public data, learns only *t*, |*X*|, i.e. the threshold
of matches needed to trigger a report, and size of the servers database,
and the server only learns the set of random client identifiers
**and** the associated data of the intersection of triples
with hashes that overlap with entries in the database, and the number of
such intersecting matches are greater than *t*.

In addition, when the number of matches is below the threshold, it is desirable that the server have “some uncertainty” that this is never quantified says a lot about how much thought went into this element of the system as to the exact size of the intersection and its contents.

To do this the client is also given a small secret set of identifiers
*S* which are designated
**synthetic matches**. These relate to actual identifiers
on the client (i.e. in set *Y*).

The paper omits how values in *S* are actually chosen:

“We refer to the technical summary document [App21] for a further discussion of the purpose of synthetic matches and how the client selects the setS” - Apple PSI System

[App21] does not detail how the client
selects set *S*)

Then, when an image arrives on the device it is assigned a new triple
(`hash`

,`id`

,`ad`

)in
*Y* and the client generates a
voucher for the triple.

There are 2 different ways of constructing a voucher depending on
whether or not `id`

∈ *S*

If `id`

∋ *S*:

- The image data is encrypted with an account specific associated data key.
- The id of the image is permuted with a pseudorandom function, to
produce an x-coordinate of the secret shard, and an input element for a
detectable hash functionmore on the DHF
later.
*r*=*D**H**F*(`hkey`

,*x*′) where`hkey`

is the clients static key for the hash function. - A new threshold share is generated such that
*t*shares are needed to reconstruct the associated data key. - A random key is chosen and used to encrypt the output from the
detectable hash
*r*, encrypted image data and the secret share. This ciphertext is called (`rct`

) - The hash is used to lookup a value in the public distributed data
(using a set of cuckoo hash functions and cuckoo table containing public
curve elements that are either random or specially constructed if the
hash appears in
*X*), and the value there is used to generate a specific set of curve points (*Q*,*S*) from which a set of keys is derived and then encrypted with the random key. This happens twice with different hash functions and random values such that at least one of the calculated (*Q*,*S*) are related (*S*_{i}=*α*⋅*Q*_{i}) only if the hash has an entry in the cuckoo table, or randomly independent otherwise. - Finally, a voucher is constructed containing the random id of the
image and modified ciphertexts in a random order, and
`rct`

and sent to the server.

This is a simplification.

If `id`

∈ *S*:

- A random key is used to encrypt all zeros instead of the image data, this is never decrypted
- The id of the image is permuted with a pseudorandom function. Instead of using the derived x-coordinate in the shard, a random value is used instead. And, instead of hashing the derived value with DHF, a random element in the output space of DHF is used instead.
- A random key is generated and used to encrypt the dummy data (
`rct`

) - Choose a random (
*Q*,*S*) to satisfy the relation*S*_{1}=*α*⋅*Q*_{1}, (*Q*_{2},*S*_{2}) to be independently random, derive keys and encrypt with the random key. - Finally, a voucher is constructed containing the id of the image,
and these modified ciphertexts in a random order, and
`rct`

and sent to the server.

### On the Server

For each voucher that arrives at the server:

- Store the id,
- For each of the 2 ciphertexts in the voucher:
- derive
*S*_{i}=*α*⋅*Q*_{j} - derive the random key by decrypting the ciphertext with a key
derived from
*S*_{j} - attempt to decrypt
`rct`

with the derived random key. - If this fails then mark this attempt as false.

- derive
- If both attempts are marked as false then this is a non-match
- If either attempt is successful then the decrypted tuple is added to the set of known shares.
- (If both are successful then th client is being malicious, and the server is instructed to ignore it)

At this point some shares are “real”, and some shares are “synthetic”. The server does not ostensibly know which is which.

See Obfuscated Apples for a probabilistic analysis which disputes this claim.

If the size of this set exceeds the threshold then the server can use the detectable hash function to distinguish them.

## The Detectable Hash Function

Not much has been written about the new, proposed Detectable Hash Function, but it is the pin on which the privacy of this system ultimately rests (under the assumption that the rest of the crypto is sound).

A Detectable Hash Function (or (s-t)DHF) is defined a keyed hash
function that takes in a key and an element and outputs a value in some
distribution *R*^{t} where *t* is the threshold of the system. It
is designed to be used in systems that deliberately mix genuinely hashed
values with randomly selected elements from the hash space (*R*).

There exists a **detection algorithm** which is
deterministic and invoked as *D*(*v*) where v is a vector of
elements in *R*. The algorithm
outputs either a vector of true elements, or fails.

When at least *t* + 1 entries
in *v* are generated by a DHF
and at most *s* are random then
the algorithm should identify all generated elements i.e. it can
distinguish random elements from hashed elements.

The paper constructs a hash function by defining the keys to be a
sequence of polynomials of degree at most *t* − 1 arranged into an *s**ṫ* matrix.

$$ DHF(k, x_0) := (x_0, p_1(x_0),...p_s(x_0)) \in \mathbb{F}_{l}^{s+1} \vphantom{+ \frac{1}{1}}$$

Where, *l* is some large
64bit number. This hash is treated as a column vector of size *s* + 1.

For random elements the distribution (*D**H**F*(*k*,*x*_{1})...*D**H**F*((*k*,*x*_{t})))
is random in 𝔽_{l}^{t}

i.e. for every input *X* we
multiply it by some sequence of polynomials and derive a sequence of
elements 𝔽_{l}^{s + 1}.

The detection algorithm is given a vector elements of *R* which are then arranged and
expanded into a larger vector, by taking the first element, *x*_{o}, to
incrementing powers up *t* − 1,
followed by the rest of the values from the hash output.

After the expansion, every vector is arranged into a matrix *M* such that every row of *M* contains at least *t* + 1 evaluations of some polynomial
at most *t* − 1. e.g.

$$ \begin{bmatrix} 1 & 1 & \dots & 1 \\ x_0 & r & \dots & x_{n} \\ x_0^2 & r^2 & \dots & x_{n}^2 \\ \vdots & \vdots & \ddots & \vdots & \\ x_0^{t-1} & r^{t-1} & \dots & x_{n}^{t-1} \\ p_0(x_0) & r_1 & \dots & p_0(x_n) \\ \vdots & \vdots & \ddots & \vdots & \\ p_s(x_0) & r_s & \dots & p_s(x_n) \end{bmatrix} $$

The kernel of this matrix is calculated and, the row indexes of *M* that do not map to the nullspace
are the indexes of “real” matches i.e. the values of a vector *w* such that *M* ⋅ *w* = 0 can only be
non-zero at positions that correspond to DHF columns.

Or rather, there is a relation between real hashes (they all contain evaluations of the key polynomial) that the synthetic hashes do not have. This allows them to be distinguished.

Recall that for non-synthetic vouchers clients derives a hash of
`id`

(or rather some random
mapping of id into the input space of DHF).

Because of this, the detection algorithm allows the server to recover
the identifiers of the actual secret shards to use to reconstruct the
associated data key. (Recall that the decrypted `rct`

contains *r* = *D**H**F*(`hkey`

,*x*′))

After this point if the number of “real” matches is over the threshold then the server can decrypt all associated data.

## Cute Algorithm meets World

This is a very cute algorithm, exploiting some nice linear algebra to recover the indices of real matches, and the rest is standard cryptography to do the actual threshold encryption.

The paper does document a couple of **real world
considerations** which I will briefly revisit and then provide a
few of my own.

**Multiple Client Devices** in order to properly track
matches across devices every device needs to share the same set of
keys.

If a user stores **duplicate images** that have the same
hash but different random identifiers and the hash is matched in the
database, then the 2 images will count twice towards the threshold.
There is no mechanism in this protocol to prevent this, but the paper
says it will be dealt with outside the protocol(presumably by the photo app, or the operating system,
checking that duplicate images don’t get assigned different
identifiers…).

I have already documented a few issues with the overall probabilistic
model in Obfuscated Apples - the
analysis there doesn’t rely on any of the specific details in the
protocol and instead focusing on how *t* is chosen in the first place.

However, now that we dived deeper it is clear to see a number of other issues:

First, the public cuckoo table that is distributed to clients is refreshed periodically under the assumption that vouchers generated with the old set of public data can be combined with vouchers generated from the new set. This would likely happen when new images (or rather their resulting hashes) are added to the backend database.

There is a small probability that an image that matches against one cuckoo table does not match against another. That means that if a client ends up uploading the same image identifier across the updates, then it could have different outcomes.

From a general perspective, this doesn’t change the security argument
much, and the server is **supposed** to learn about
matching files.

**However**, this also presents additional metadata to
the system regarding the *use* of the files and **allows
the server to distinguish a synthetic match from a real match in the
case where one fails, and the other succeeds**. Recall: Identifiers which trigger synthetic matches
always succeed regardless of the set of images.

This is not the only problems with parameter updates, the system
requires a trusted third party to verify that the cuckoo table is
continually computed correctly. This trusted third party needs access to
both the original database of images, the hashing function *and*
the cuckoo table setup. Without this trusted third party, Apple could
fill that table to match arbitrary hashes without anyone being able to
verify.

Given that, a malicious server can learn whatever it wants about the client’s dataset bounded only by some external-to-the-protocol third party which both has to police what images are in the database and ensure that the server never distributes bad public data to clients. This is the real policy question of the system and one that I think has already covered extensively elsewhereSee: The Problems and Complications of Apple Monitoring for Child Sexual Abuse Material in iCloud Photos by Christopher Parsons..

Back to the technical, it is interesting that none of the analysis considers a malicious client violating the correctness of the protocol. The reason given is “because there are many ways in which a malicious client can refuse to participate.”

This is funny in and of itself because the sole intention of this system is to catch malicious people.

Attacks from malicious clients are an interesting academic consideration, the main one documented in system is the client generating too many synthetic matches which drastically slows down the detection algorithm.

However, I also think it is worth considering a DoS attack from the client attempting to generated “matches” that ultimately decrypt to nonsense. This can be done by submitting vouchers for randomly generated hashes (as shown in Obfuscated Apples it doesn’t take that many random photos to trigger false positives in most perceptual algorithms given a large enough database that is being checked against) - this attack could likely be conducted using the phone hardware itself, maybe even through malware. There do not appear to be any defenses to this, and even through it is a blind attack, the literature on adversarial exploitation of perceptual hashing algorithms is not on the side of Apple here.

After these deep dives I remain thoroughly unconvinced of the
technical soundness of this system, I can’t see how it can uphold its
privacy properties under normal operation, I think there are fundamental
questions about how Apple is choosing parameters (like *t* and *S*) that significantly change the
properties of the system, and I think there are malicious avenues for
exploitation of this system even beyond the policy discussions circling
in the media.