Bloom filters in Ethereum


bloom

Background

Bloom filters are compact data structures for probabilistic representation of a set in order to support membership queries (i.e. queries that ask: “Is element X in set Y?”).

An empty Bloom filter is a bit array of m bits, all set to 0. There must also be k different hash functions defined, each of which maps or hashes some set element to one of the m array positions, generating a uniform random distribution. Typically, k is a constant, much smaller than m, which is proportional to the number of elements to be added; the precise choice of k and the constant of proportionality of m are determined by the intended false positive rate of the filter.

To add an element, feed it to each of the k hash functions to get k array positions. Set the bits at all these positions to 1.

To query for an element (test whether it is in the set), feed it to each of the k hash functions to get k array positions. If any of the bits at these positions is 0, the element is definitely not in the set – if it were, then all the bits would have been set to 1 when it was inserted. If all are 1, then either the element is in the set, or the bits have by chance been set to 1 during the insertion of other elements, resulting in a false positive. In a simple Bloom filter, there is no way to distinguish between the two cases, but more advanced techniques can address this problem

Pros:

  • space efficient

Cons:

  • can’t store an associated object
  • deletions are not allowed
  • small false positive probability

Applications:

  • weak password dictionary
  • cache sharing
  • query filtering and routing
  • blockchain (logs)

Ethereum bloom filter function

Here’s excerpt from the yellow paper:

ethereum-bloom

If you didn’t understand anything, no worries, I will try to describe its implementation in the next section.

Elixir implementation

  @spec bloom(integer(), binary()) :: integer()
  def bloom(data, number \\ 0) do
    bits =
      data
      |> sha3_hash            \\ (1)
      |> bit_numbers          \\ (2)

    number |> add_bits(bits)  \\ (3)
  end

The bloom/2 method accepts 2 aguments:

  • data - binary. a new object to add to the bloom filter
  • number - integer. the bloom filter. default value is zero (for new filter)

Bloom filter generation can be divided into three parts:

  • calculate sha3 hash of an input data (1)
  • get three bit indices from this hash (2)
  • set these indices to one in the filter (3)
  @spec bit_numbers(binary()) :: [integer()]
  defp bit_numbers(hash) do
    {result, _} =
      1..3
      |> Enum.reduce({[], hash}, fn(_, acc) ->
        {bits, <<head::integer-size(16), tail::bitstring>>} = acc
        new_bit = head &&& 2047

        {[new_bit|bits], tail}
      end)

    result
  end

The bit_numbers/1 method gets indices from the low order 11-bits of the first three double-bytes of the SHA3 hash.

  @spec add_bits(integer(), [integer()]) :: integer()
  defp add_bits(bloom_number, bits) do
    bits
    |> Enum.reduce(bloom_number, fn(bit_number, bloom) ->
      bloom ||| (1 <<< bit_number)
    end)
  end

The last thing left to do is to define method to check if filter has an object:

  @spec contains?(integer(), binary()) :: boolean()
  def contains?(current_bloom, val)
      when is_integer(current_bloom) and
           is_binary(val) do
    temp_bloom = bloom(val)

    (temp_bloom &&& current_bloom) == temp_bloom
  end

Now we can use our Bloom filter:

iex(1)> filter = EthBloom.create("rock")
364236115780354413527177740824718248475191169123433179704674083584030171707548895141763338864017157468044931245187476551639917006481686358559826056819612140499456696964058447681377941080842953023782344796365486712958202642967905678562795813182653349819408730761190197772532753708011416878102770310284058867224260870159690230717337345574083724951534664321152788810106636281468311475567122070065682554961594780791945980493299780947753165983844578885074727735505366654109046841451940478976

iex(2)> new_filter = EthBloom.add(filter, "blues")
364236115780354413527177740824718248475191169123433179704674083584030171707548895141763338864017157468044931245187476551639917008583551526575664755314165081912131926893248480084782472542419712838325886207711387518315201817289948316059274529230320334129359156724892064138319603709966466517388006639868069257972765204989827007471817298541374196685568760261999794300328280707725224364790299899128819784076530509337252429855458993811873518002173094917629136344228828519818452981281942732800

iex(3)> EthBloom.contains?(new_filter, "punk")
false

iex(4)> EthBloom.contains?(new_filter, "rock")
true

iex(5)> EthBloom.contains?(new_filter, "blues")
true

Described code has a GitHub repository - https://github.com/ayrat555/eth_bloom

See also

  • https://ethereum.github.io/yellowpaper/paper.pdf
  • https://en.m.wikipedia.org/wiki/Bloom_filter
  • https://www.coursera.org/learn/algorithms-graphs-data-structures/lecture/riKfa/bloom-filters-the-basics