DATA MANIPULATION(数据操作)
Inserting Data(插入数据)
使用SQL语句向Crate插入数据。
注意
列总是按照列名称的字母顺序排序。如果省略插入列,则VALUES子句中的必须和表中的列按顺序对应。
插入一行:
cr> insert into locations (id, date, description, kind, name, position)
... values (
... '14',
... '2013-09-12T21:43:59.000Z',
... 'Blagulon Kappa is the planet to which the police are native.',
... 'Planet',
... 'Blagulon Kappa',
... 7
... );
INSERT OK, 1 row affected (... sec)
当插入单行数据时如果有错误发生,将返回错误信息。
可以为INSERT语句定义多个values,这样一次(aka. bulk insert)就可以插入多行数据:
cr> insert into locations (id, date, description, kind, name, position) values
... (
... '16',
... '2013-09-14T21:43:59.000Z',
... 'Blagulon Kappa II is the planet to which the police are native.',
... 'Planet',
... 'Blagulon Kappa II',
... 19
... ),
... (
... '17',
... '2013-09-13T16:43:59.000Z',
... 'Brontitall is a planet with a warm, rich atmosphere and no mountains.',
... 'Planet',
... 'Brontitall',
... 10
... );
INSERT OK, 2 rows affected (... sec)
当一次插入多行时,如果某些行出错,将不会返回错误信息信息,但插入的行数将会根据失败的行数而减少。
当向包含生成列的表中插入数据时,生成列的值将被安全地忽略。它是在插入时生成的:
cr> insert into debit_card (owner, num_part1, num_part2) values
... ('Zaphod Beeblebrox', 1234, 5678);
INSERT OK, 1 row affected (... sec)
如果给定一个值,则会根据列生成子句对当前要插入的行进行校验:
cr> insert into debit_card (owner, num_part1, num_part2, check_sum) values
... ('Arthur Dent', 9876, 5432, 642935);
SQLActionException[SQLParseException: Given value 642935 for generated column does not match defined generated expression value 642936]
按查询插入数据(Inserting Data By Query)
可以使用query语句替换values语句来插入数据。源表和目的表中列的数据类型可以不同,只要数据类型可以转换。这样为以下的操作提供了可能,重建表数据,重命名一个字段,改变一个字段的数据类型或者将一个普通表转换成一个分区表。
改变一个字段的数据类型,在这种情况下,将字段position的数据类型从integer改为short:
cr> create table locations2 (
... id string primary key,
... name string,
... date timestamp,
... kind string,
... position short,
... description string
... ) clustered by (id) into 2 shards with (number_of_replicas = 0);
CREATE OK, 1 row affected (... sec)
cr> insert into locations2 (id, name, date, kind, postition, description)
... (
... select id, name, date, kind, position, description
... from locations
... where position < 10
... );
INSERT OK, 14 rows affected (... sec)
在locations表之外创建一个新以year分区的分区表:
cr> create table locations_parted (
... id string primary key,
... name string,
... year string primary key,
... date timestamp,
... kind string,
... position integer
... ) clustered by (id) into 2 shards
... partitioned by (year) with (number_of_replicas = 0);
CREATE OK, 1 row affected (... sec)
cr> insert into locations_parted (id, name, year, date, kind, postition)
... (
... select
... id,
... name,
... date_format('%Y', date),
... date,
... kind,
... position
... from locations
... );
INSERT OK, 16 rows affected (... sec)
最后通过查询插入将数据插入到分区表:
cr> select table_name, partition_ident, values, number_of_shards, number_of_replicas
... from information_schema.table_partitions
... where table_name = 'locations_parted'
... order by partition_ident;
+------------------+-----------------+------------------+------------------+--------------------+
| table_name | partition_ident | values | number_of_shards | number_of_replicas |
+------------------+-----------------+------------------+------------------+--------------------+
| locations_parted | 042j2e9n74 | {"year": "1979"} | 2 | 0 |
| locations_parted | 042j4c1h6c | {"year": "2013"} | 2 | 0 |
+------------------+-----------------+------------------+------------------+--------------------+
SELECT 2 rows in set (... sec)
注意
limit,offset和order by在内部插叙语句中不支持。
多关键字更新(On Duplicate Key Update)
ON DUPLICATE KEY UPDATE子句用在更新已存在的行,如果由于duplicate-key和现有文档PRIMARY KEY键冲突无法插入时将更新现有行:
cr> select
... name,
... visits,
... extract(year from last_visit) as last_visit
... from uservisits order by name;
+----------+--------+------------+
| name | visits | last_visit |
+----------+--------+------------+
| Ford | 1 | 2013 |
| Trillian | 3 | 2013 |
+----------+--------+------------+
SELECT 2 rows in set (... sec)
cr> insert into uservisits (id, name, visits, last_visit) values
... (
... 0,
... 'Ford',
... 1,
... '2015-09-12'
... ) on duplicate key update
... visits = visits + 1,
... last_visit = '2015-01-12';
INSERT OK, 1 row affected (... sec)
cr> select
... name,
... visits,
... extract(year from last_visit) as last_visit
... from uservisits where id = 0;
+------+--------+------------+
| name | visits | last_visit |
+------+--------+------------+
| Ford | 2 | 2015 |
+------+--------+------------+
SELECT 1 row in set (... sec)
可以使用VALUES(column_ident)函数引用没有重复键冲突时插入的值。这个函数特别用于多行插入,可引用当前行的值:
cr> insert into uservisits (id, name, visits, last_visit) values
... (
... 0,
... 'Ford',
... 2,
... '2016-01-13'
... ),
... (
... 1,
... 'Trillian',
... 5,
... '2016-01-15'
... ) on duplicate key update
... visits = visits + VALUES(visits),
... last_visit = VALUES(last_visit);
INSERT OK, 2 rows affected (... sec)
cr> select
... name,
... visits,
... extract(year from last_visit) as last_visit
... from uservisits order by name;
+----------+--------+------------+
| name | visits | last_visit |
+----------+--------+------------+
| Ford | 4 | 2016 |
| Trillian | 8 | 2016 |
+----------+--------+------------+
SELECT 2 rows in set (... sec)
也可以使用一个query语句代替values:
cr> create table uservisits2 (
... id integer primary key,
... name string,
... visits integer,
... last_visit timestamp
... ) clustered by (id) into 2 shards with (number_of_replicas = 0);
CREATE OK, 1 row affected (... sec)
cr> insert into uservisits2 (id, name, visits, last_visit)
... (
... select id, name, visits, last_visit
... from uservisits
... );
INSERT OK, 2 rows affected (... sec)
cr> insert into uservisits2 (id, name, visits, last_visit)
... (
... select id, name, visits, last_visit
... from uservisits
... )
... on duplicate key update
... visits = visits + VALUES(visits),
... last_visit = VALUES(last_visit);
INSERT OK, 2 rows affected (... sec)
cr> select
... name,
... visits,
... extract(year from last_visit) as last_visit
... from uservisits order by name;
+----------+--------+------------+
| name | visits | last_visit |
+----------+--------+------------+
| Ford | 4 | 2016 |
| Trillian | 8 | 2016 |
+----------+--------+------------+
SELECT 2 rows in set (... sec)
更新数据(Updating Data)
要更新Crate中的文档,可使用UPDATE的SQL语句:
cr> update locations set description = 'Updated description'
... where name = 'Bartledan';
UPDATE OK, 1 row affected (... sec)
Updating nested objects is also supported:
cr> update locations set race['name'] = 'Human' where name = 'Bartledan';
UPDATE OK, 1 row affected (... sec)
也可以在一个表达式中引用一个列,例如像这样自加一个数字:
cr> update locations set position = position + 1 where position < 3;
UPDATE OK, 6 rows affected (... sec)
注意
如果同时更新同一个文档时发生一个版本冲突的异常VersionConflictException。Crate有一个重试的逻辑并且尝试自动解决冲突。
删除数据(Deleting Data)
使用DELETE语句删除Crate中的行:
cr> delete from locations where position > 3;
DELETE OK, ... rows affected (... sec)
导入/导出(Import/Export)
Importing Data
使用SQL语句 COPY FROM可以将数据导入到Crate中。目前只支持JSON数据格式,一行代表一条数据。 Example JSON data: 例如JSON数据: {"id": 1, "quote": "Don't panic"} {"id": 2, "quote": "Ford, you're turning into a penguin. Stop it."}
注意
- COPY FROM将不会转换或校验你的数据。请确保它和你的schema相匹配。
- 生成列的值如果数据中没有将会计算产生,否则他们将会导入但不校验,所以请确保数据的正确性。
- 此外,数据中的列名称是区分大小写的(就像他们在sql语句中被引用一样)。 为了将数据导入到分区表中,请参见COPY FROM
从文件URI导入(Import From File URI)
一个使用文件URI导入的例子:
cr> copy quotes from 'file:///tmp/import_data/quotes.json';
COPY OK, 3 rows affected (... sec)
如果所有文件都在一个文件夹应该使用通配符*:
cr> copy quotes from '/tmp/import_data/*' with (concurrency = 1, bulk_size = 4);
COPY OK, 3 rows affected (... sec)
通配符也可以只匹配确切的文件:
cr> copy quotes from '/tmp/import_data/qu*.json';
COPY OK, 3 rows affected (... sec)
详情参见COPY FROM。
(数据导出)Exporting Data
可以使用COPY TO语句导出数据。数据导出以分布式方式导出,这意味着每个节点将导出自己的数据。
副本数据将不会导出。因此导出表的每一行只存储一次。
此例子展示如何将给定表数据导出到以表名和分片id命名的gzip压缩文件中:
cr> refresh table quotes;
REFRESH OK...
cr> copy quotes to DIRECTORY '/tmp/' with (compression='gzip');
COPY OK, 3 rows affected ...
可以使用可选的where子句条件过滤一些行问不是导出整张表。如果只需要导出部分数据此方式非常有用:
cr> copy quotes where match(quote_ft, 'time') to DIRECTORY '/tmp/' with (compression='gzip');
COPY OK, 2 rows affected ...
有关详情请参见COPY TO。