JOINS(表连接)
JOINs are essential operations in relational databases. They create a link between rows based on common values and allow the meaningful combination of these rows. Crate supports joins and due to its distributed nature allows you to work with large amounts of data.
表连接在关系型数据库中是很有用的操作。他们创建一个行和公共值之前的一个连接而且允许这些行有意义的组合。Crate支持连接而且由于Crate天然支持分布式,所以它可以支持大数据量的操作。
In this document we will present the following topics. First, an overview of the existing types of joins and algorithms provided. Then a description of how Crate implements them, and finally how Crate uses optimizations to work with huge datasets.
在这个文档中我们将阻止以下的主题。 首先,一个已经存在的连接类型和提供的算法。
Types of Joins(连接类型)
A join is a relational operation that merges two data sets based on certain properties. Figure 1 (Inspired by this article) shows which elements appear in which join.
一个连接的关系型操作可以基于确切的属性合并两个数据集。(图1)展示哪些元素将在连接中出现。
图1
左连接,右连接,内连接,外连接,而且L和R的集合的交集
Cross Join(交叉连接)
返回一个或多个关系的笛卡尔积。L和R关系的笛卡尔积的结果是有L关系中所元组和R关系中元组的所有可能的排列组合。
Inner Join(内连接)
An inner join is a join of two or more relations that returns only tuples that satisfy the join condition.
一个内连接是一个两个或者多个的关系的连接而且只返回满足连接条件的元组。
Equi Join(等值连接)
An equi join is a subset of an inner join and a comparison-based join, that uses equality comparisons in the join condition. The equi join of the relation L and R combines tuple lof relation L with a tuple r of the relation R if the join attributes of both tuples are identical.
一个等值连接是一个内部连接和一个比较连接的子集,在连接关系中使用等值连接。关系L和关系R的等值连接是把L和R通过有相等的属性值的元组来连接。
Outer Join(外连接)
An outer join returns a relation consisting of tuples that satisfy the join condition and dangling tuples from both or one of the relations, respectively to the outer join type.
一个外连接返回满足连接条件的行组成的关系,以及从两个或其中一个关系到外部链接类型的悬空行。
An outer join has following types:(一个外连接有以下类型)
Left outer join returns tuples of the relation L matching tuples of the relation R and dangling tuples of the relation R padded with null values.
左外连接返回关系L中匹配到关系R中的元组。用空值填充R关系中的空值元组。
Right outer join returns tuples of the relation R matching tuples of the relation L and dangling tuples from the relation L padded with null values.
右外连接返回关系R中匹配到L关系中的元组,而且将L中空的元组用空值填充。
Full outer join returns matching tuples of both relations and dangling tuples produced by left and right outer joins.
全外连接返回左外连接和右外连接以及填充的空值元组。
Joins in Crate(Crate中的连接)
Crate supports (a) CROSS JOIN, (b) INNER JOIN, (c) EQUI JOIN, (d) LEFT JOIN, (e) RIGHT JOIN and (f) FULL JOIN. To implement these, the nested loop join algorithm is implemented with a few optimizations.
Crate支持(a)交叉连接,(b)内连接,(c)等值连接,(d)左连接,(e)右连接和(f) 全连接。为了实现这些,使用一系列内部循环连接算法。
Nested Loop Join(内部循环连接)
The nested loop join is the simplest join algorithm. One of the relations is nominated as the inner relation and the other as the outer relation. Each tuple of the outer relation is compared with each tuple of the inner relation and if the join condition is satisfied, the tuples of the relation L and R are concatenated and added into the new relation:
嵌套循环连接是最简单的连接算法。其中一个关系被指定为内部关系,另一个被指定为外部关系。外关系的每个元组和每一个内部关系的每个元组进行比较,并且如果满足条件,则关系L和R的元组被连接并添加到新的关系中:
for each tuple l ∈ L do
for each tuple r ∈ R do
if l.a Θ r.b
put tuple(l, r) in Q
Listing 1. Nested loop join algorithm. 列表 1. 内部循环连接算法 Other Algorithms 其他算法。
Sort-Merge Join and Hash Join are currently not implemented. More information can be found here.
排序-合并连接和散列连接当前还没有实现。更多的信息请查看这里 Pimitive Nested Loop(原始嵌套循环)
For joins on some relations, the nested loop operation can be executed directly on the handler node. Specifically for queries involving a CROSS JOIN or joins on system tables/information_schema each shard sends the data to the handler node. Afterwards, this node runs the nested loop, applies limits, etc. and ultimately returns the results. Similarly, joins can be nested, so instead of collecting data from shards the rows can be the result of a previous join or table function.
对于一些关系连接,循环嵌套操作将被在每个处理节点上执行。特别为涉及到CROSS JOIN和连接到system tables/information_schema的查询,每个分片将发送数据到处理节点。
Distributed Nested Loop(分布式的内循环)
Relations are usually distributed to different nodes which require the nested loop to acquire the data before being able to join. After finding the locations of the required shards (which is done in the planning stage), the smaller data set (based on the row count) is broadcast amongst all the nodes holding the shards they are joined with. After that, each of the receiving nodes can start running a nested loop on the subset it has just received. Finally, the results are pushed to the original (planner) node to merge and return the results to the requesting client (see Figure 2).
关系通常分布到不同的节点,这些节点需要嵌套循环以在能够加入之前获取数据。 找到需要的分片的位置后,最小的数据集(基于行计数)在所有持有它需要连接的分片集群节点间广播。之后,每一个收到的节点可以在它们接收到的子集上运行循环嵌套。最终,结果将会推送到原始(执行计划)节点合并并将结果返回给请求的客户端。
If the rows in the join result from or in another (nested) join or use a table function , the data is broadcast from (or to) different nodes directly.
如果在连接中的行来自其他的内部链接或者使用一个表函数,这些数据将直接来自于广播或者广播出去。
图 2
Nodes that are holding the smaller shards broadcast the data to the processing nodes which then return the results to the requesting node.
注意持有的小的数据分片关闭数据到处理节点,这些节点将结果返回请求的节点。
Optimization(优化)
Crate implements joins using a nested loop - which means that the runtime complexity grows exponentially (O(n*m)). Specifically for Cross Joins this results in large amounts of data sent over the network and loaded into memory at the handler node. Crate reduces the volume of data transferred by employing Query Then Fetch: First, filtering and ordering are applied (if possible where the data is located) to obtain the affected document IDs. Next, as soon as the final data set is ready, Crate fetches the selected fields and returns the data to the client.
Crate使用嵌套循环实现连接-这意味着运行时间复杂度将指数级增长。在处理节点连接,大数据量的结果的交集发送到网络并且加载到处理节点的内存中。Crate 通过查询在获取来减少传输的数据量:首先,应用过滤和排序来使用获得文档的IDS。然后,一旦最终数据集已经准备好,Crate将取出选择好的字段的数据返回给客户端。
Pre-Ordering and Limits(在排序和限制数据条数之前)
Queries can be optimized if they contain (a) ORDER BY, (b) LIMIT, or (c) if INNER/EQUI JOIN. In any of these cases, the nested loop can be terminated earlier:
查询可以被优化如果他们包含 (a) ORDER BY, (b) LIMIT 或者 (c) INNER/EQUI JOIN。在这些情况下,嵌套循环可能被打断。
Ordering allows determining whether there are records left
Limit states the maximum number of rows that are returned
Consequently, the number of rows is significantly reduced allowing the operation to complete much faster.
- 排序取决于连接的左边是否有记录
- limit表示返回的最大行数
因此,行数越少,操作完成越快。
Push-down Query Optimization(查询优化)
Complex queries such as Listing 2 require the planner to decide when to filter, sort, and merge in order to efficiently execute the plan. In this case, the query would be split internally into subqueries before running the nested loop. As shown in Figure 3, first filtering (and ordering) is applied to relations L and R on their shards, then the result is directly broadcast to the nodes running the nested loop. Not only will this behavior reduce the number of rows to work with, it also distributes the workload among the nodes so that the (expensive) join operation can run faster.
复杂查询如列表2为了有效执行计划,需要计划者来决定什么时候过滤,排序,而且合并。在这个例子中,在循环嵌套执行前,查询将被内部分割成一些子查询。像在图3中所示,首先过滤(并且排序)将被应用在关系L和R的分片上,之后结果直接广播到执行嵌套循环的节点。这样操作不仅仅会减少工作的行,也可以在节点之间分散负载。因此连接操作将会执行得更快。
SELECT L.a, R.x
FROM L, R
WHERE L.id = R.id
AND L.b > 100
AND R.y < 10
ORDER BY L.a
Listing 2. An INNER JOIN on ids (effectively an EQUI JOIN) which can be optimized.
列表 2.一个在ids上的内部连接(有效的一个等值连接)可以被优化
Figure 3
Complex queries are broken down into subqueries that are run on their shards before joining.