Chapter Fifteen
Review¶
- 按照列来存放:适合分析型(catch hit,data compress, parallism)
- query->parser and translator->relational algebra expression->optimizer(statistics about data)-> evaluation->result
- 语意检查
- 关系代数表达式
- 利用逻辑转换规则:(先筛选再 join)
- 物理优化:用什么算法实现什么算子
- pipeline
- 如何计算查询 cost:[!note]  
- extra buffer space
- worst case estimation
File Scan¶
- linear search ()
Hash Join Algorithm¶
- 
Partition the relation s using hashing function h. When partitioning a relation, one block of memory is reserved as the output buffer for each partition. Partition r similarly. 
- 
For each i: 
- 
(A): Load si into memory and build an in-memory hash index on it using the join attribute. This hash index uses a different hash function than the earlier one h. 
- 
Read the tuples in ri from the disk one by one. For each tuple tr locate each matching tuple ts in si using the in-memory hash index. Output the concatenation of their attributes. 
Relation s is called the build input and r is called the probe input.
- 
The value n and the hash function h is chosen such that each si should fit in memory. 
- 
Typically n is chosen as \(n = \lceil \frac{b_s}{M} \rceil \mul f\) where f is a “fudge factor”, typically around 1.2 
- 
The probe relation partitions ri need not fit in memory 
- 
Recursive partitioning required if number of partitions n is greater than number of pages M of memory.