400 028 6601

建站动态

根据您的个性需求进行定制 先人一步 抢占小程序红利时代

【MySQL】loaddata语句详解(二)

作者:罗小波
沃趣科技高级MySQL数据库工程师

1.2.6. FIELDS(与COLUMNS关键字相同)和LINES子句 1.2.6.1. FIELDS关键字及其子句详解
    1. admin@localhost : xiaoboluo 03:08:34> select * from test3 into outfile "/tmp/test3.txt" FIELDS TERMINATED BY ',';
    2. Query OK, 4 rows affected (0.00 sec)
    3. admin@localhost : xiaoboluo 03:08:37> system cat /tmp/test3.txt
    4. 2,a string,100.20
    5. 4,a string containing a \, comma,102.20
    6. 6,a string containing a " quote,102.20
    7. 8,a string containing a "\, quote and comma,102.20
    1. # 指定字段引用符为",不使用optionally关键字
    2. admin@localhost : xiaoboluo 03:33:33> system rm -f /tmp/test3.txt;
    3. admin@localhost : xiaoboluo 03:37:21> select * from test3 into outfile "/tmp/test3.txt" FIELDS ENCLOSED BY '"';
    4. Query OK, 5 rows affected (0.00 sec)
    5. admin@localhost : xiaoboluo 03:37:33> system cat /tmp/test3.txt
    6. "2" "a string" "100.20"
    7. "4" "a string containing a , comma" "102.20"
    8. "6" "a string containing a \" quote" "102.20"
    9. "8" "a string containing a \", quote and comma" "102.20"
    10. "10" "\\t" "102.20"
    11. # 指定字段引用符为",使用optionally关键字,可以看到id列的字段引用符去掉了
    12. admin@localhost : xiaoboluo 03:37:41> system rm -f /tmp/test3.txt;
    13. admin@localhost : xiaoboluo 03:40:53> select * from test3 into outfile "/tmp/test3.txt" FIELDS optionally ENCLOSED BY '"';
    14. Query OK, 5 rows affected (0.00 sec)
    15. admin@localhost : xiaoboluo 03:41:03> system cat /tmp/test3.txt
    16. 2 "a string" "100.20"
    17. 4 "a string containing a , comma" "102.20"
    18. 6 "a string containing a \" quote" "102.20"
    19. 8 "a string containing a \", quote and comma" "102.20"
    20. 10 "\\t" "102.20
    1. admin@localhost : xiaoboluo 03:42:41> system rm -f /tmp/test3.txt;
    2. admin@localhost : xiaoboluo 03:44:18> select * from test3 into outfile "/tmp/test3.txt" fields escaped by '.';
    3. Query OK, 5 rows affected (0.00 sec)
    4. admin@localhost : xiaoboluo 03:44:25> system cat /tmp/test3.txt # 可以看到数据中指定的转义符.号被转义了,而数据\t没有被转义
    5. 2 a string 100..20
    6. 4 a string containing a , comma 102..20
    7. 6 a string containing a " quote 102..20
    8. 8 a string containing a ", quote and comma 102..20
    9. 10 \t 102..20
    10. admin@localhost : xiaoboluo 03:44:28> truncate test3; #清空表
    11. Query OK, 0 rows affected (0.01 sec)
    12. admin@localhost : xiaoboluo 03:45:19> load data infile "/tmp/test3.txt" into table test3 fields escaped by '.'; #导入数据时指定转义符为.号
    13. Query OK, 5 rows affected (0.00 sec)
    14. Records: 5 Deleted: 0 Skipped: 0 Warnings: 0
    15. admin@localhost : xiaoboluo 03:45:40> select * from test3; #校验数据,可以看到导入数据正常
    16. +----+------------------------------------------+--------+
    17. | id | test | test2 |
    18. +----+------------------------------------------+--------+
    19. | 2 | a string | 100.20 |
    20. | 4 | a string containing a , comma | 102.20 |
    21. | 6 | a string containing a " quote | 102.20 |
    22. | 8 | a string containing a ", quote and comma | 102.20 |
    23. | 10 | \t | 102.20 |
    24. +----+------------------------------------------+--------+
    25. 5 rows in set (0.00 sec)
1.2.6.2. LINES 关键字及其子句详解

如果您想要读取的纯文本文件中所有行都有一个您想要忽略的公用前缀,则可以使用LINES STARTING BY'prefix_string'来跳过这个前缀,以及前缀字符前面的任何内容。如果某行数据不包含前缀字符,则跳过整行内容,例

    1. # load data语句如下
    2. admin@localhost : xiaoboluo 03:48:04> system rm -f /tmp/test3.txt;
    3. admin@localhost : xiaoboluo 03:54:54> select * from test3 into outfile "/tmp/test3.txt" LINES STARTING BY 'xxx';
    4. Query OK, 5 rows affected (0.00 sec)
    5. admin@localhost : xiaoboluo 03:55:03> system cat /tmp/test3.txt #可以看到每行数据前面多了个行前缀字符串xxx
    6. xxx2 a string 100.20
    7. xxx4 a string containing a , comma 102.20
    8. xxx6 a string containing a " quote 102.20
    9. xxx8 a string containing a ", quote and comma 102.20
    10. xxx10 \\t 102.20
    11. # 现在,到shell命令行去修改一下,增加两行
    12. admin@localhost : xiaoboluo 03:55:50> system cat /tmp/test3.txt # 最后要加载的纯文本数据内容如下
    13. xxx2 a string 100.20
    14. xxx4 a string containing a , comma 102.20
    15. xxx6 a string containing a " quote 102.20
    16. xxx8 a string containing a ", quote and comma 102.20
    17. xxx10 \\t 102.20
    18. 12 \\t 102.20
    19. dfadsfasxxx14 \\t 102.20
    20. admin@localhost : xiaoboluo 03:59:03> truncate test3; #清空表
    21. Query OK, 0 rows affected (0.01 sec)
    22. admin@localhost : xiaoboluo 03:59:38> load data infile "/tmp/test3.txt" into table test3 LINES STARTING BY 'xxx'; #导入数据,指定行前缀字符为xxx
    23. Query OK, 6 rows affected (0.00 sec)
    24. Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
    25. admin@localhost : xiaoboluo 03:59:44> select * from test3; #校验表数据,可以看到没有xxx行前缀的行被忽略了,而包含xxx的最后一行,从xxx开始截断,xxx字符本身及其之前的内容被忽略,\
    26. xxx之后的内容被解析为行数据导入了
    27. +----+------------------------------------------+--------+
    28. | id | test | test2 |
    29. +----+------------------------------------------+--------+
    30. | 2 | a string | 100.20 |
    31. | 4 | a string containing a , comma | 102.20 |
    32. | 6 | a string containing a " quote | 102.20 |
    33. | 8 | a string containing a ", quote and comma | 102.20 |
    34. | 10 | \t | 102.20 |
    35. | 14 | \t | 102.20 |
    36. +----+------------------------------------------+--------+
    37. 6 rows in set (0.00 sec)
    38. 行结束符(换行符),linux下默认为\n,使用子句lines terminated by 'string' 指定,其中string代表指定的换行符
    39. # 指定换行符为\r\n导出数据
    40. admin@localhost : xiaoboluo 03:59:49> system rm -f /tmp/test3.txt;
    41. admin@localhost : xiaoboluo 04:02:22> select * from test3 into outfile "/tmp/test3.txt" lines terminated by '\r\n';
    42. Query OK, 6 rows affected (0.00 sec)
    43. # 由于linux的一些命令本身会解析掉这些特殊字符,所以使用python来查看这个文本文件中的换行符,从下面的结果中可以看到,列表的每一个元素代表一行数据,每一个元素的\
    44. 末尾的\r\n就是这行数据的换行符
    45. >>> f = open('/tmp/test3.txt','r')
    46. >>> data = f.readlines()
    47. >>> data
    48. ['2\ta string\t100.20\r\n', '4\ta string containing a , comma\t102.20\r\n', '6\ta string containing a " quote\t102.20\r\n', '8\ta string containing a ", quote and comma\t102.20\r\n', '10\t\\\\t\t102.20\r\n', \
    49. '14\t\\\\t\t102.20\r\n']
    50. >>>
    51. # 现在,把数据重新导入表,从下面的结果中可以看到,导入表中的数据正确
    52. admin@localhost : xiaoboluo 04:02:39> truncate test3;
    53. Query OK, 0 rows affected (0.01 sec)
    54. admin@localhost : xiaoboluo 04:04:55> load data infile "/tmp/test3.txt" into table test3 lines terminated by '\r\n';
    55. Query OK, 6 rows affected (0.00 sec)
    56. Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
    57. admin@localhost : xiaoboluo 04:05:11> select * from test3;
    58. +----+------------------------------------------+--------+
    59. | id | test | test2 |
    60. +----+------------------------------------------+--------+
    61. | 2 | a string | 100.20 |
    62. | 4 | a string containing a , comma | 102.20 |
    63. | 6 | a string containing a " quote | 102.20 |
    64. | 8 | a string containing a ", quote and comma | 102.20 |
    65. | 10 | \t | 102.20 |
    66. | 14 | \t | 102.20 |
    67. +----+------------------------------------------+--------+
    68. 6 rows in set (0.00 sec)
1.2.6.3. FIELDS和LINES注意事项
    1. \0 ASCII NUL (X'00') 字符
    2. \b 退格字符
    3. \n 换行符
    4. \r 回车符
    5. \t 制表符
    6. \Z ASCII 26 (Control+Z)
    7. \N NULL值,如果转义符值为空,则会直接导出null字符串作为数据,这在导入时将把null作为数据导入,而不是null符号
    1. # 字段引用符为",数据中包含",转义符和换行符保持默认,导入数据时不会有任何问题
    2. admin@localhost : xiaoboluo 09:46:14> select * from test3;
    3. +----+------------------------------------------+--------+
    4. | id | test | test2 |
    5. +----+------------------------------------------+--------+
    6. | 2 | a string | 100.20 |
    7. | 4 | a string containing a , comma | 102.20 |
    8. | 6 | a string containing a " quote | 102.20 |
    9. | 8 | a string containing a ", quote and comma | 102.20 |
    10. +----+------------------------------------------+--------+
    11. 4 rows in set (0.00 sec)
    12. admin@localhost : xiaoboluo 09:46:17> select * from test3 into outfile "/tmp/test3.txt" FIELDS OPTIONALLY enclosed BY '"';
    13. Query OK, 4 rows affected (0.00 sec)
    14. admin@localhost : xiaoboluo 09:46:23> system cat /tmp/test3.txt;
    15. 2 "a string" "100.20"
    16. 4 "a string containing a , comma" "102.20"
    17. 6 "a string containing a \" quote" "102.20"
    18. 8 "a string containing a \", quote and comma" "102.20" # 可以看到与字段引用符相同的符号数据被转义了
    19. admin@localhost : xiaoboluo 09:54:41> truncate test3;
    20. Query OK, 0 rows affected (0.01 sec)
    21. admin@localhost : xiaoboluo 09:58:01> load data infile '/tmp/test3.txt' into table test3 FIELDS OPTIONALLY enclosed BY '"' TERMINATED by ',';
    22. Query OK, 4 rows affected (0.00 sec)
    23. Records: 4 Deleted: 0 Skipped: 0 Warnings: 0
    24. admin@localhost : xiaoboluo 09:58:45> select * from test3;
    25. +----+------------------------------------------+--------+
    26. | id | test | test2 |
    27. +----+------------------------------------------+--------+
    28. | 2 | a string | 100.20 |
    29. | 4 | a string containing a , comma | 102.20 |
    30. | 6 | a string containing a " quote | 102.20 |
    31. | 8 | a string containing a ", quote and comma | 102.20 |
    32. +----+------------------------------------------+--------+
    33. 4 rows in set (0.00 sec)
    34. # 如果字段引用符为",字段分隔符为,且数据中包含字段引用符"和字段分隔符,,转义符和换行符保持默认,这在导入数据时不会有任何问题
    35. admin@localhost : xiaoboluo 09:53:45> select * from test3 into outfile "/tmp/test3.txt" FIELDS OPTIONALLY enclosed BY '"' TERMINATED by ',';
    36. Query OK, 4 rows affected (0.00 sec)
    37. admin@localhost : xiaoboluo 09:54:29> system cat /tmp/test3.txt;
    38. 2,"a string","100.20"
    39. 4,"a string containing a , comma","102.20"
    40. 6,"a string containing a \" quote","102.20"
    41. 8,"a string containing a \", quote and comma","102.20"
    42. admin@localhost : xiaoboluo 09:54:41> truncate test3;
    43. Query OK, 0 rows affected (0.01 sec)
    44. admin@localhost : xiaoboluo 09:58:01> load data infile '/tmp/test3.txt' into table test3 FIELDS OPTIONALLY enclosed BY '"' TERMINATED by ',';
    45. Query OK, 4 rows affected (0.00 sec)
    46. Records: 4 Deleted: 0 Skipped: 0 Warnings: 0
    47. admin@localhost : xiaoboluo 09:58:45> select * from test3;
    48. +----+------------------------------------------+--------+
    49. | id | test | test2 |
    50. +----+------------------------------------------+--------+
    51. | 2 | a string | 100.20 |
    52. | 4 | a string containing a , comma | 102.20 |
    53. | 6 | a string containing a " quote | 102.20 |
    54. | 8 | a string containing a ", quote and comma | 102.20 |
    55. +----+------------------------------------------+--------+
    56. 4 rows in set (0.00 sec)
    57. # 但是,如果在字段引用符为",数据中包含",字段分隔符使用逗号,换行符保持默认的情况下,转义符使用了空串,这会导致在导入数据时,第四行无法正确解析,报错
    58. admin@localhost : xiaoboluo 09:58:01> load data infile '/tmp/test3.txt' into table test3 FIELDS OPTIONALLY enclosed BY '"' TERMINATED by ',';
    59. Query OK, 4 rows affected (0.00 sec)
    60. Records: 4 Deleted: 0 Skipped: 0 Warnings: 0
    61. admin@localhost : xiaoboluo 09:58:45> select * from test3;
    62. +----+------------------------------------------+--------+
    63. | id | test | test2 |
    64. +----+------------------------------------------+--------+
    65. | 2 | a string | 100.20 |
    66. | 4 | a string containing a , comma | 102.20 |
    67. | 6 | a string containing a " quote | 102.20 |
    68. | 8 | a string containing a ", quote and comma | 102.20 |
    69. +----+------------------------------------------+--------+
    70. 4 rows in set (0.00 sec)
    71. admin@localhost : xiaoboluo 09:58:49> select * from test3 into outfile "/tmp/test3_test.txt" FIELDS OPTIONALLY enclosed BY '"' TERMINATED by ',' escaped by '';
    72. Query OK, 4 rows affected (0.00 sec)
    73. admin@localhost : xiaoboluo 10:00:42> system cat /tmp/test3_test.txt;
    74. 2,"a string","100.20"
    75. 4,"a string containing a , comma","102.20"
    76. 6,"a string containing a " quote","102.20" #关于这一行数据,需要说明一下ENCLOSED BY子句,该子句指定的引用符号从一个FIELDS TERMINATED BY子句指定的分隔符开始,直到碰到下一个\
    77. 分隔符之间且这个分隔符前面一个字符必须是字段引用符号(如果这个分隔符前面一个字符不是字段引用符,则继续往后匹配,如第二行数据),在这之间的内容都会被当作整个列字符串处理,\
    78. 所以这一行数据在导入时不会发生解析错误
    79. 8,"a string containing a ", quote and comma","102.20" #这一行因为无法正确识别的字段结束位置,所以无法导入,报错终止,前面正确的行也被回滚掉(binlog_format=row)
    80. admin@localhost : xiaoboluo 10:00:49> truncate test3;
    81. Query OK, 0 rows affected (0.01 sec)
    82. admin@localhost : xiaoboluo 10:01:03> load data infile '/tmp/test3_test.txt' into table test3 FIELDS OPTIONALLY enclosed BY '"' TERMINATED by ',' escaped by '';
    83. ERROR 1262 (01000): Row 4 was truncated; it contained more data than there were input columns
    84. admin@localhost : xiaoboluo 10:01:33> select * from test3;
    85. Empty set (0.00 sec)
    86. # 数据中包含了默认的转义符和指定的字段分隔符,字段引用符和行分隔符使用默认值,则在数据中的转义符和字段分隔符会被转义(只要不为空,则不管字段分隔符和转义字符定义为什么值,\
    87. 都会被转义)
    88. admin@localhost : xiaoboluo 03:08:45> insert into test3(test,test2) values('\\t','102.20');
    89. Query OK, 1 row affected (0.00 sec)
    90. admin@localhost : xiaoboluo 03:17:29> select * from test3;
    91. +----+------------------------------------------+--------+
    92. | id | test | test2 |
    93. +----+------------------------------------------+--------+
    94. | 2 | a string | 100.20 |
    95. | 4 | a string containing a , comma | 102.20 |
    96. | 6 | a string containing a " quote | 102.20 |
    97. | 8 | a string containing a ", quote and comma | 102.20 |
    98. | 10 | \t | 102.20 |
    99. +----+------------------------------------------+--------+
    100. 5 rows in set (0.00 sec)
    101. admin@localhost : xiaoboluo 03:17:32> system rm -f /tmp/test3.txt;
    102. admin@localhost : xiaoboluo 03:17:39> select * from test3 into outfile "/tmp/test3.txt" FIELDS TERMINATED BY ',';
    103. Query OK, 5 rows affected (0.01 sec)
    104. admin@localhost : xiaoboluo 03:17:42> system cat /tmp/test3.txt
    105. 2,a string,100.20
    106. 4,a string containing a \, comma,102.20
    107. 6,a string containing a " quote,102.20
    108. 8,a string containing a "\, quote and comma,102.20
    109. 10,\\t,102.20
    1. # 假设您执行SELECT ... INTO OUTFILE语句时使用了逗号作为列分隔符:
    2. SELECT * INTO OUTFILE 'data.txt'
    3.  FIELDS TERMINATED BY ','
    4.  FROM table2;
    5. # 如果您尝试使用\t作为列分隔符,则它将无法正常工作,因为它会指示LOAD DATA INFILE在字段之间查找制表符,可能导致每个数据行整行解析时被当作单个字段:
    6. LOAD DATA INFILE 'data.txt' INTO TABLE table2
    7.  FIELDS TERMINATED BY '\t';
    8. # 要正确读取逗号分隔各列的文件,正确的语句是
    9. LOAD DATA INFILE 'data.txt' INTO TABLE table2
    10.  FIELDS TERMINATED BY ','
    1. # 如果LINES TERMINATED BY换行符指定了一个空字符,并且FIELDS TERMINATED BY字段分隔符指定的是非空的一个字符(或者使用默认值\t),则行也会以字段分隔符作为行的结束符\
    2. (表现行为就是文本中最后一个字符就是字段分隔符),即整个文本看上去就是一整行数据了
    3. admin@localhost : xiaoboluo 04:48:35> system rm -f /tmp/test3.txt;
    4. admin@localhost : xiaoboluo 04:53:59> select * from test3 into outfile "/tmp/test3.txt" FIELDS TERMINATED BY ',' lines terminated by '';
    5. Query OK, 6 rows affected (0.00 sec)
    6. # 使用python查看文本内容,从下面的结果中可以看到,整个表的数据由于换行符为空,所以导致都拼接为一行了,最后行结束符使用了字段分隔符逗号
    7. >>> f = open('/tmp/test3.txt','r')
    8. >>> data = f.readlines()
    9. >>> data
    10. ['2,a string,100.20,4,a string containing a \\, comma,102.20,6,a string containing a " quote,102.20,8,a string containing a "\\, quote and comma,102.20,10,\\\\t,102.20,14,\\\\t,102.20,']
    11. >>>
    12. # 导入数据到表,这里新建一张表来进行导入测试,预防清理掉了表数据之后,文本内容又无法正确导入的情况发生
    13. admin@localhost : xiaoboluo 04:57:52> create table test4 like test3;
    14. Query OK, 0 rows affected (0.01 sec)
    15. admin@localhost : xiaoboluo 04:57:59> load data infile "/tmp/test3.txt" into table test4 FIELDS TERMINATED BY ',' lines terminated by '';
    16. Query OK, 6 rows affected (0.00 sec)
    17. Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
    18. admin@localhost : xiaoboluo 04:58:26> select * from test4; #从查询结果上看,数据正确导入表test4中了
    19. +----+-----<
      文章题目:【MySQL】loaddata语句详解(二)
      文章URL:http://www.bluegullmedia.com/article/giddis.html
  • 网站建设专属方案

  • 网站定制化设计

  • 7X24小时服务

  • N对管家服务

让你的专属顾问为你服务

0.6877s