0%

How to Replace IN and NOT IN in Hive

In Hive, you cannot always use IN and NOT IN the same way you would in PostgreSQL or MySQL. Similar logic is usually implemented with joins instead.

Implementing IN

To replace an IN clause, you can use LEFT SEMI JOIN.

Original query:

1
2
3
4
5
SELECT a.key, a.value
FROM a
WHERE a.key IN
(SELECT b.key
FROM b);

Hive version:

1
2
3
SELECT a.key, a.val
FROM a
LEFT SEMI JOIN b ON (a.key = b.key)

Implementing NOT IN

Original query:

1
2
3
4
5
SELECT a.key, a.value
FROM a
WHERE a.key NOT IN
(SELECT b.key
FROM b);

In Hive, this can be rewritten with a LEFT OUTER JOIN plus a NULL check:

1
2
3
4
5
6
7
SELECT t1.key
FROM
(SELECT key FROM a) t1
LEFT OUTER JOIN
(SELECT key FROM b) t2
ON (t1.key = t2.key)
WHERE t2.key IS NULL
如果我的文字帮到了您,那么可不可以请我喝罐可乐?