DATA MANIPULATION(数据操作)

Inserting Data(插入数据)

使用SQL语句向Crate插入数据。

注意
列总是按照列名称的字母顺序排序。如果省略插入列，则VALUES子句中的必须和表中的列按顺序对应。

插入一行：

cr> insert into locations (id, date, description, kind, name, position)
... values (
...   '14',
...   '2013-09-12T21:43:59.000Z',
...   'Blagulon Kappa is the planet to which the police are native.',
...   'Planet',
...   'Blagulon Kappa',
...   7
... );
INSERT OK, 1 row affected (... sec)

当插入单行数据时如果有错误发生，将返回错误信息。

可以为INSERT语句定义多个values，这样一次(aka. bulk insert)就可以插入多行数据:

cr> insert into locations (id, date, description, kind, name, position) values
... (
...   '16',
...   '2013-09-14T21:43:59.000Z',
...   'Blagulon Kappa II is the planet to which the police are native.',
...   'Planet',
...   'Blagulon Kappa II',
...   19
... ),
... (
...   '17',
...   '2013-09-13T16:43:59.000Z',
...   'Brontitall is a planet with a warm, rich atmosphere and no mountains.',
...   'Planet',
...   'Brontitall',
...   10
... );
INSERT OK, 2 rows affected (... sec)

当一次插入多行时，如果某些行出错，将不会返回错误信息信息，但插入的行数将会根据失败的行数而减少。

当向包含生成列的表中插入数据时，生成列的值将被安全地忽略。它是在插入时生成的：

cr> insert into debit_card (owner, num_part1, num_part2) values
... ('Zaphod Beeblebrox', 1234, 5678);
INSERT OK, 1 row affected (... sec)

如果给定一个值，则会根据列生成子句对当前要插入的行进行校验:

cr> insert into debit_card (owner, num_part1, num_part2, check_sum) values
... ('Arthur Dent', 9876, 5432, 642935);
SQLActionException[SQLParseException: Given value 642935 for generated column does not match defined generated expression value 642936]

按查询插入数据(Inserting Data By Query)

可以使用query语句替换values语句来插入数据。源表和目的表中列的数据类型可以不同，只要数据类型可以转换。这样为以下的操作提供了可能，重建表数据，重命名一个字段，改变一个字段的数据类型或者将一个普通表转换成一个分区表。

改变一个字段的数据类型，在这种情况下，将字段position的数据类型从integer改为short:

cr> create table locations2 (
...     id string primary key,
...     name string,
...     date timestamp,
...     kind string,
...     position short,
...     description string
... ) clustered by (id) into 2 shards with (number_of_replicas = 0);
CREATE OK, 1 row affected (... sec)

cr> insert into locations2 (id, name, date, kind, postition, description)
... (
...     select id, name, date, kind, position, description
...     from locations
...     where position < 10
... );
INSERT OK, 14 rows affected (... sec)

在locations表之外创建一个新以year分区的分区表：

cr> create table locations_parted (
...     id string primary key,
...     name string,
...     year string primary key,
...     date timestamp,
...     kind string,
...     position integer
... ) clustered by (id) into 2 shards
... partitioned by (year) with (number_of_replicas = 0);
CREATE OK, 1 row affected (... sec)

cr> insert into locations_parted (id, name, year, date, kind, postition)
... (
...     select
...         id,
...         name,
...         date_format('%Y', date),
...         date,
...         kind,
...         position
...     from locations
... );
INSERT OK, 16 rows affected (... sec)

最后通过查询插入将数据插入到分区表:

cr> select table_name, partition_ident, values, number_of_shards, number_of_replicas
... from information_schema.table_partitions
... where table_name = 'locations_parted'
... order by partition_ident;
+------------------+-----------------+------------------+------------------+--------------------+
| table_name       | partition_ident | values           | number_of_shards | number_of_replicas |
+------------------+-----------------+------------------+------------------+--------------------+
| locations_parted | 042j2e9n74      | {"year": "1979"} |                2 |                  0 |
| locations_parted | 042j4c1h6c      | {"year": "2013"} |                2 |                  0 |
+------------------+-----------------+------------------+------------------+--------------------+
SELECT 2 rows in set (... sec)

注意
limit,offset和order by在内部插叙语句中不支持。

多关键字更新(On Duplicate Key Update)

ON DUPLICATE KEY UPDATE子句用在更新已存在的行，如果由于duplicate-key和现有文档PRIMARY KEY键冲突无法插入时将更新现有行:

cr> select
...     name,
...     visits,
...     extract(year from last_visit) as last_visit
... from uservisits order by name;
+----------+--------+------------+
| name     | visits | last_visit |
+----------+--------+------------+
| Ford     |      1 | 2013       |
| Trillian |      3 | 2013       |
+----------+--------+------------+
SELECT 2 rows in set (... sec)

cr> insert into uservisits (id, name, visits, last_visit) values
... (
...     0,
...     'Ford',
...     1,
...     '2015-09-12'
... ) on duplicate key update
...     visits = visits + 1,
...     last_visit = '2015-01-12';
INSERT OK, 1 row affected (... sec)

cr> select
...     name,
...     visits,
...     extract(year from last_visit) as last_visit
... from uservisits where id = 0;
+------+--------+------------+
| name | visits | last_visit |
+------+--------+------------+
| Ford |      2 | 2015       |
+------+--------+------------+
SELECT 1 row in set (... sec)

可以使用VALUES(column_ident)函数引用没有重复键冲突时插入的值。这个函数特别用于多行插入，可引用当前行的值:

cr> insert into uservisits (id, name, visits, last_visit) values
... (
...     0,
...     'Ford',
...     2,
...     '2016-01-13'
... ),
... (
...     1,
...     'Trillian',
...     5,
...     '2016-01-15'
... ) on duplicate key update
...     visits = visits + VALUES(visits),
...     last_visit = VALUES(last_visit);
INSERT OK, 2 rows affected (... sec)

cr> select
...     name,
...     visits,
...     extract(year from last_visit) as last_visit
... from uservisits order by name;
+----------+--------+------------+
| name     | visits | last_visit |
+----------+--------+------------+
| Ford     |      4 | 2016       |
| Trillian |      8 | 2016       |
+----------+--------+------------+
SELECT 2 rows in set (... sec)

也可以使用一个query语句代替values:

cr> create table uservisits2 (
...   id integer primary key,
...   name string,
...   visits integer,
...   last_visit timestamp
... ) clustered by (id) into 2 shards with (number_of_replicas = 0);
CREATE OK, 1 row affected (... sec)

cr> insert into uservisits2 (id, name, visits, last_visit)
... (
...     select id, name, visits, last_visit
...     from uservisits
... );
INSERT OK, 2 rows affected (... sec)

cr> insert into uservisits2 (id, name, visits, last_visit)
... (
...     select id, name, visits, last_visit
...     from uservisits
... )
... on duplicate key update
...     visits = visits + VALUES(visits),
...     last_visit = VALUES(last_visit);
INSERT OK, 2 rows affected (... sec)

cr> select
...     name,
...     visits,
...     extract(year from last_visit) as last_visit
... from uservisits order by name;
+----------+--------+------------+
| name     | visits | last_visit |
+----------+--------+------------+
| Ford     |      4 | 2016       |
| Trillian |      8 | 2016       |
+----------+--------+------------+
SELECT 2 rows in set (... sec)

更新数据(Updating Data)

要更新Crate中的文档，可使用UPDATE的SQL语句:

cr> update locations set description = 'Updated description'
... where name = 'Bartledan';
UPDATE OK, 1 row affected (... sec)
Updating nested objects is also supported:

cr> update locations set race['name'] = 'Human' where name = 'Bartledan';
UPDATE OK, 1 row affected (... sec)

也可以在一个表达式中引用一个列，例如像这样自加一个数字:

cr> update locations set position = position + 1 where position < 3;
UPDATE OK, 6 rows affected (... sec)

注意
如果同时更新同一个文档时发生一个版本冲突的异常VersionConflictException。Crate有一个重试的逻辑并且尝试自动解决冲突。

删除数据(Deleting Data)

使用DELETE语句删除Crate中的行:

cr> delete from locations where position > 3;
DELETE OK, ... rows affected (... sec)

导入/导出(Import/Export)

Importing Data

使用SQL语句 COPY FROM可以将数据导入到Crate中。目前只支持JSON数据格式,一行代表一条数据。 Example JSON data: 例如JSON数据: {"id": 1, "quote": "Don't panic"} {"id": 2, "quote": "Ford, you're turning into a penguin. Stop it."}

注意

COPY FROM将不会转换或校验你的数据。请确保它和你的schema相匹配。

生成列的值如果数据中没有将会计算产生，否则他们将会导入但不校验，所以请确保数据的正确性。

此外，数据中的列名称是区分大小写的(就像他们在sql语句中被引用一样)。为了将数据导入到分区表中，请参见COPY FROM

从文件URI导入(Import From File URI)

一个使用文件URI导入的例子:

cr> copy quotes from 'file:///tmp/import_data/quotes.json';
COPY OK, 3 rows affected (... sec)

如果所有文件都在一个文件夹应该使用通配符*:

cr> copy quotes from '/tmp/import_data/*' with (concurrency = 1, bulk_size = 4);
COPY OK, 3 rows affected (... sec)

通配符也可以只匹配确切的文件:

cr> copy quotes from '/tmp/import_data/qu*.json';
COPY OK, 3 rows affected (... sec)

详情参见COPY FROM。

(数据导出)Exporting Data

可以使用COPY TO语句导出数据。数据导出以分布式方式导出，这意味着每个节点将导出自己的数据。

副本数据将不会导出。因此导出表的每一行只存储一次。

此例子展示如何将给定表数据导出到以表名和分片id命名的gzip压缩文件中:

cr> refresh table quotes;
REFRESH OK...

cr> copy quotes to DIRECTORY '/tmp/' with (compression='gzip');
COPY OK, 3 rows affected ...

可以使用可选的where子句条件过滤一些行问不是导出整张表。如果只需要导出部分数据此方式非常有用:

cr> copy quotes where match(quote_ft, 'time') to DIRECTORY '/tmp/' with (compression='gzip');
COPY OK, 2 rows affected ...

有关详情请参见COPY TO。

5.2 数据操作(Data Manipulation)