Tratamento de Colisões

Sabemos que uma colisão ocorre quando a função hash produz o mesmo valor para duas ou mais chaves distintas. Isto é: \[h(k_1) = h(k_2)\]

Colisões são praticamente inevitáveis, por isso devemos tratá-las. As principais abordagens são endereçamento aberto e encadeamento separado.

Endereçamento aberto: percorrer a tabela em busca de um local não ocupado.
Encadeamento separado: uso de listas ligadas para armazenar as colisões.

Se liga! Endereçamento aberto: procura um endereço aberto, um slot vazio. Encadeamento separado: usa listas encadeadas separadas, uma para cada slot.

Endereçamento aberto

A técnica de endereçamento aberto tenta resolver o problema de colisão buscando um outro lugar vazio para armazenar a chave que colidiu. As maneiras principais de sondar um espaço vazio são:

Sondagem linear;
Sondagem quadrática;
Duplo hash

Sondagem linear

A sondagem linear procura pela próxima posição vazia do vetor. A função de hash possui uma nova variável iterativa que permitirá a procura por esse lugar vazio: \[h(k,i) = (h'(k)+i)\bmod m\] onde $i=0,1,2,3,\ldots$

Sempre que ocorre uma colisão, esta variável ($i$) é incrementada em uma unidade.

Por exemplo, considere o vetor abaixo:

slot	0	1	2	3	4	5	6	7	8	9
chave	❌	21	42	63	❌	❌	❌	❌	❌	69

E a função de hash auxiliar $h'(k)=k\bmod 10$.

Vamos adicionar a chave 81:

$h(81,0) = (h'(81) + 0) \bmod 10 = (1+0) \bmod 10 = 1$, colisão com 21
$h(81,1) = (h'(81) + 1) \bmod 10 = (1+1) \bmod 10 = 2$, colisão com 42
$h(81,2) = (h'(81) + 2) \bmod 10 = (1+2) \bmod 10 = 3$, colisão com 63
$h(81,3) = (h'(81) + 3) \bmod 10 = (1+3) \bmod 10 = 4$

Como a posição 4 está vazia, a chave 81 será alocada lá.

slot	0	1	2	3	4	5	6	7	8	9
chave	❌	21	42	63	81	❌	❌	❌	❌	69

Sondagem quadrática

A sondagem quadrática procura pela próxima posição vazia usando uma função quadrática. \[h(k,i) = (h'(k)+c_1i+c_2i^2)\bmod m\] onde $c_1 e c_2$ são constantes escolhidas e $i=0,1,2,3,\ldots$

Considere o exemplo anterior com sondagem quadrática. Vamos escolher $c_1=0$ e $c_2=1$, ou seja, $h(k,i) = (h'(k)+i^2)\bmod m$.

$h(81,0) = (h'(81) + 0^2) \bmod 10 = (1+0) \bmod 10 = 1$, colisão com 21
$h(81,1) = (h'(81) + 1^2) \bmod 10 = (1+1) \bmod 10 = 2$, colisão com 42
$h(81,2) = (h'(81) + 2^2) \bmod 10 = (1+4) \bmod 10 = 5$

Como a posição 5 está vazia, a chave 81 será alocada lá.

slot	0	1	2	3	4	5	6	7	8	9
chave	❌	21	42	63	❌	81	❌	❌	❌	69

Duplo hash

Como o nome sugere, o tratamento de colisão usa duas funções hash ($h_1$ e $h_2$). \[h(k,i) = (h_1(k) + i\cdot h_2(k)) \bmod m\] onde $i=0,1,2,3,\ldots$

Por exemplo, suponha $h_1(k) = k \bmod 10$ e $h_2(k) = 5 − (k \bmod 5)$.

No exemplo anterior teremos:

$h(81,0) = (h_1(81) + 0 \cdot h_2(81)) \bmod 10 = (1+0) \bmod 10 = 1$, colisão com 21
$h(81,1) = (h_1(81) + 1 \cdot h_2(81)) \bmod 10 = (1 + 4) \bmod 10 = 5$

Ou seja,

slot	0	1	2	3	4	5	6	7	8	9
chave	❌	21	42	63	❌	81	❌	❌	❌	69

Abaixo, listamos mais um exemplo considerando as três abordagens apresentadas:

chaves: 15, 22, 7, 13, 25 e 14
tamanho: 7 slots

\[h'(k)=k \bmod 7\] \[h(k,i)=(h'(k)+i) \bmod 7\]

Chave 15 $\rightarrow h'(15)=15 \bmod 7=1$

$h(15,0)=(h'(15)+0) \bmod 7=(1+0) \bmod 7=1$

Chave 22 $\rightarrow h'(22)=22 \bmod 7=1$

$h(22,0)=(h'(22)+0) \bmod 7=(1+0) \bmod 7=1$, ocupado
$h(22,1)=(h'(22)+1) \bmod 7=(1+1) \bmod 7=2$

Chave 7 $\rightarrow h'(7)=7 \bmod 7=0$

$h(7,0)=(h'(7)+0) \bmod 7=(0+0) \bmod 7=0$

Chave 13 $\rightarrow h'(13)=13 \bmod 7=6$

$h(13,0)=(h'(13)+0) \bmod 7=(6+0) \bmod 7=6$

Chave 25 $\rightarrow h'(25)=25 \bmod 7=4$

$h(25,0)=(h'(25)+0) \bmod 7=(4+0) \bmod 7=4$

Chave 14 $\rightarrow h'(14)=14 \bmod 7=0$

$h(14,0)=(h'(14)+0) \bmod 7=(0+0) \bmod 7=0$, ocupado
$h(14,1)=(h'(14)+1) \bmod 7=(0+1) \bmod 7=1$, ocupado
$h(14,2)=(h'(14)+2) \bmod 7=(0+2) \bmod 7=2$, ocupado
$h(14,3)=(h'(14)+3) \bmod 7=(0+3) \bmod 7=3$

Resultado:

slot	0	1	2	3	4	5	6
chave	7	15	22	14	25	❌	13

\[h'(k)=k \bmod 7\] \[h(k,i)=(h'(k)+i+i^2) \bmod 7\]

Chave 15 $\rightarrow h'(15)=15 \bmod 7=1$

$h(15,0)=(h'(15)+0+0^2) \bmod 7=(1+0) \bmod 7=1$

Chave 22 $\rightarrow h'(22)=22 \bmod 7=1$

$h(22,0)=(h'(22)+0+0^2) \bmod 7=(1+0) \bmod 7=1$, ocupado
$h(22,1)=(h'(22)+1+1^2) \bmod 7=(1+2) \bmod 7=3$

Chave 7 $\rightarrow h'(7)=7 \bmod 7=0$

$h(7,0)=(h'(7)+0+0^2) \bmod 7=(0+0) \bmod 7=0$

Chave 13 $\rightarrow h'(13)=13 \bmod 7=6$

$h(13,0)=(h'(13)+0+0^2) \bmod 7=(6+0) \bmod 7=6$

Chave 25 $\rightarrow h'(25)=25 \bmod 7=4$

$h(25,0)=(h'(25)+0+0^2) \bmod 7=(4+0) \bmod 7=4$

Chave 14 $\rightarrow h'(14)=14 \bmod 7=0$

$h(14,0)=(h'(14)+0+0^2) \bmod 7=(0+0) \bmod 7=0$, ocupado
$h(14,1)=(h'(14)+1+1^2) \bmod 7=(0+2) \bmod 7=2$

Resultado:

slot	0	1	2	3	4	5	6
chave	7	15	14	22	25	❌	13

\[h_1(k)=k \bmod 7 \qquad h_2(k) = 1 + (k \bmod 6)\] \[h(k,i)=(h_1(k)+ih_2(k)) \bmod 7\]

Chave 15 $\rightarrow h_1(15)=15 \bmod 7=1$

$h(15,0)=(h_1(15)+0\cdot h_2(15)) \bmod 7=(1+0) \bmod 7=1$

Chave 22 $\rightarrow h_1(22)=22 \bmod 7=1$ e $h_2(22)=1+(22 \bmod 6)=5$

$h(22,0)=(h_1(22)+0\cdot h_2(22)) \bmod 7=(1+0) \bmod 7=1$, ocupado
$h(22,1)=(h_1(22)+1\cdot h_2(22)) \bmod 7=(1+5) \bmod 7=6$

Chave 7 $\rightarrow h_1(7)=7 \bmod 7=0$

$h(7,0)=(h_1(7)+0\cdot h_2(7)) \bmod 7=(0+0) \bmod 7=0$

Chave 13 $\rightarrow h_1(13)=13 \bmod 7=6$ e $h_2(13)=1+(13 \bmod 6)=2$

$h(13,0)=(h_1(13)+0\cdot h_2(13)) \bmod 7=(6+0) \bmod 7=6$, ocupado
$h(13,1)=(h_1(13)+1\cdot h_2(13)) \bmod 7=(6+2) \bmod 7=1$, ocupado
$h(13,2)=(h_1(13)+2\cdot h_2(13)) \bmod 7=(6+4) \bmod 7=3$

Chave 25 $\rightarrow h_1(25)=25 \bmod 7=4$

$h(25,0)=(h_1(25)+0\cdot h_2(25)) \bmod 7=(4+0) \bmod 7=4$

Chave 14 $\rightarrow h_1(14)=14 \bmod 7=0$ e $h_2(14)=1+(14 \bmod 6)=3$

$h(14,0)=(h_1(14)+0\cdot h_2(14)) \bmod 7=(0+0) \bmod 7=0$, ocupado
$h(14,1)=(h_1(14)+1\cdot h_2(14)) \bmod 7=(0+3) \bmod 7=3$, ocupado
$h(14,2)=(h_1(14)+2\cdot h_2(14)) \bmod 7=(0+6) \bmod 7=6$, ocupado
$h(14,3)=(h_1(14)+3\cdot h_2(14)) \bmod 7=(0+9) \bmod 7=2$

Resultado:

slot	0	1	2	3	4	5	6
chave	7	15	14	13	25	❌	22

Encadeamento separado

A técnica de encadeamento separado é bastante simples. Ao invés de tratar a colisão procurando um lugar vazio, iremos colocar as colisões em uma lista encadeada. Logo, não teremos um vetor de chaves, mas sim um vetor de ponteiros que apontam para listas encadeadas separadamente.

flowchart LR
 subgraph s1["🔑 Chaves"]
        A0@{ label: "'maçã'" }
        B0@{ label: "'uva'" }
        C0@{ label: "'laranja'" }
        D0@{ label: "'banana'" }
        E0@{ label: "'morango'" }
  end
 subgraph s2["⚙️ Função Hash"]
  end
 subgraph s3["📊 Tabela Hash"]
        A2["Índice 0: ❌"]
        B2["Índice 1: 🔗"]
        C2["Índice 2: 🔗"]
        D2["Índice 3: 🔗"]
        E2["Índice 4: 🔗"]
  end
 subgraph s4[" "]
        C1@{ label: "🎯 'uva'" }
        B1@{ label: "🎯 'laranja'" }
        A1@{ label: "🎯 'maçã'" }
        D1@{ label: "🎯 'banana'" }
        E1@{ label: "🎯 'morango'" }
        n1["NULL"]
        n2["NULL"]
        n3["NULL"]
        n4["NULL"]
  end
    A0 ==> s2
    B0 ==> s2
    C0 ==> s2
    D0 ==> s2
    E0 ==> s2
    s2 ==> B2 & C2 & D2 & E2
    B2 -.- C1
    C2 -.- B1
    D2 -.- A1
    E2 -.- D1
    A1 --> E1
    C1 --> n1
    B1 --> n2
    E1 --> n3
    D1 --> n4

    A0@{ shape: rect}
    B0@{ shape: rect}
    C0@{ shape: rect}
    D0@{ shape: rect}
    E0@{ shape: rect}
    B2@{ shape: rect}
    C2@{ shape: rect}
    D2@{ shape: rect}
    E2@{ shape: rect}
    C1@{ shape: subproc}
    B1@{ shape: subproc}
    A1@{ shape: subproc}
    D1@{ shape: subproc}
    E1@{ shape: subproc}
    n1@{ shape: subproc}
    n2@{ shape: subproc}
    n3@{ shape: subproc}
    n4@{ shape: subproc}
     A0:::box
     B0:::box
     C0:::box
     D0:::box
     E0:::box
     A2:::box
     B2:::box
     C2:::box
     D2:::box
     E2:::box
     n1:::NULL_NODE
     n2:::NULL_NODE
     n3:::NULL_NODE
     n4:::NULL_NODE
    classDef box fill:#fff,stroke:#000,color:#000,stroke-width:1px
    classDef NULL_NODE fill:#F5F5F5, stroke:#BDBDBD, color:#757575, stroke-dasharray: 5 5
    style s2 stroke:none,fill:#BBDEFB
    style s1 stroke:none,fill:#FFE0B2
    style s3 stroke:none,fill:#FFE0B2
    style s4 stroke:none,fill:transparent
    linkStyle 0 stroke:#D50000,fill:none
    linkStyle 1 stroke:#00C853,fill:none
    linkStyle 2 stroke:#2962FF,fill:none
    linkStyle 4 stroke:#D50000,fill:none
    linkStyle 5 stroke:#00C853,fill:none
    linkStyle 6 stroke:#2962FF,fill:none
    linkStyle 7 stroke:#D50000,fill:none

No exemplo anterior, quando houve a colisão da chave 81 com 21, a posição 1 iria apontar para uma lista ligada com dois componentes.

flowchart TD
 subgraph s1["Array"]
        A0["❌"]
        A1["prt_1"]
        A2["prt_2"]
        A3["prt_3"]
        A4["❌"]
        A5["❌"]
        A6["❌"]
        A7["❌"]
        A8["❌"]
        A9["prt_9"]
  end
    A1 -.-> B1["21"]
    B1 --> B2["81"]
    A2 -.-> B3["42"]
    A3 -.-> B4["63"]
    A9 -.-> B5["69"]
    B2 --> n1["Filled Circle"]
    B3 --> n2["Filled Circle"]
    B4 --> n3["Filled Circle"]
    B5 --> n4["Filled Circle"]

    n1@{ shape: f-circ}
    n2@{ shape: f-circ}
    n3@{ shape: f-circ}
    n4@{ shape: f-circ}
    style A1 fill:#fff
    style A2 fill:#fff
    style A3 fill:#fff
    style A9 fill:#fff
    style B1 fill:#fff
    style B2 fill:#fff
    style B3 fill:#fff
    style B4 fill:#fff
    style B5 fill:#fff

Figure 1: Ilustração de tratamento de colisão usando listas ligadas. Cada slot aponta para a cabeça de uma lista.
Quando uma colisão ocorre, o elemento é adicionado à lista correspondente.

Exercício

Use a abordagem de encadeamento separado para criar uma tabela hash com as chaves \[k = \{15, 22, 7, 13, 25, 14, 1, 8, 2, 9, 3, 10, 4, 11, 5, 6, 20, 21, 28\}.\] Considere a função de hash $h(k)=k\bmod m$, onde $m=7$ é o tamanho do vetor.

Solução (clique para ver)

Nesta unidade, você aprendeu

✅ a identificar quando uma colisão ocorre

✅ os principais metódos para tratar colisões

# Quando ocorre uma colisão em uma tabela hash? > Pense no resultado da função hash para chaves diferentes. 1. [ ] Quando duas chaves são idênticas. 1. [x] Quando duas chaves diferentes produzem o mesmo valor hash. 1. [ ] Quando o vetor está totalmente vazio. 1. [ ] Quando a função hash retorna sempre zero. # Quais são as principais técnicas para tratamento de colisões? > Existem duas abordagens gerais descritas no texto. - [x] Endereçamento aberto - [x] Encadeamento separado - [ ] Sondagem aleatória - [ ] Redimensionamento dinâmico # Complete a ordem correta das sondagens no endereçamento aberto! > Considere o modo como o algoritmo busca novas posições vazias. 1. Sondagem linear 2. Sondagem quadrática 3. Duplo hash # Como funciona a sondagem linear? > Veja a fórmula $h(k,i) = (h'(k) + i) \bmod m$. 1. [ ] Utiliza duas funções hash diferentes. 1. [x] Incrementa o índice em uma unidade até encontrar posição vazia. 1. [ ] Usa potências de $i$ para procurar posições. 1. [ ] Usa listas encadeadas para armazenar colisões. # Na sondagem quadrática, a função hash usa quais termos? > Observe a fórmula $h(k,i) = (h'(k) + c_1 i + c_2 i^2) \bmod m$. - [x] Um termo linear ($c_1 i$) - [x] Um termo quadrático ($c_2 i^2$) - [ ] Um termo cúbico ($c_3 i^3$) - [ ] Um termo logarítmico ($\log i$) # No duplo hash, o cálculo depende de duas funções hash. Quais delas são usadas? > Elas aparecem como $h_1(k)$ e $h_2(k)$. - [x] $h_1(k)$ - [x] $h_2(k)$ - [ ] $h_3(k)$ - [ ] Nenhuma, é uma sondagem linear disfarçada # Coloque em ordem os três métodos de endereçamento aberto apresentados! > Do mais simples ao mais sofisticado. 1. Sondagem linear 2. Sondagem quadrática 3. Duplo hash # Qual é a ideia principal do encadeamento separado? > Considere o uso de ponteiros e listas encadeadas. 1. [ ] Procurar um slot vazio no vetor. 1. [ ] Remover elementos duplicados automaticamente. 1. [x] Manter listas encadeadas separadas para cada posição da tabela. 1. [ ] Usar funções hash aleatórias para evitar colisões.