mysql数据库 如果数据库中有800万条数据,我想随机抽取10000条,要怎么做抽取的更快

mysql数据库 如果数据库中有800万条数据,我想随机抽取10000条,要怎么做抽取的更快,第1张

什么叫抽取的更快?和什么比更快?你现在是怎么做的?

数据库性能是和很多因素有关的:

想要数据库响应的快,首先要有好的服务器。

如果数据库是在远程服务器上,还要有充足和流畅的带宽网络。

合理安排表的结构,建立索引。

针对你这个,800万条数据如果在一个表里,要有个整数型的ID作为主键,并做索引。如果数据是从不同的表里抽出来再组合起来的,表与表之间的链接键尽量用整数型并做索引。

然后生成10000个随机数,在ID里查找这1万个数字,取出对应的数据。

处理过程放到数据库端。

针对你这个,10000个随机数的生成函数用存储过程的形式存在服务器端。

MYSQL 取随机数

2010年04月26日 星期一 09:48

mysql 取随机数

--对一个表取任意随机数

SELECT *

FROM TMP_XF_TEST

WHERE ID >= (SELECT FLOOR(RAND() * (SELECT MAX(ID) FROM TMP_XF_TEST)))

order by id LIMIT 1

--有条件性的取随机数

SELECT *

FROM TMP_XF_TEST

WHERE ID >= (SELECT FLOOR(RAND() *

((SELECT MAX(ID) FROM TMP_XF_TEST WHERE GID = 9) -

(SELECT MIN(ID) FROM TMP_XF_TEST WHERE GID = 9))) +

(SELECT MIN(ID) FROM TMP_XF_TEST WHERE GID = 9))

AND GID = 9

ORDER BY ID LIMIT 1

--gid上存在索引

或者

SELECT *

FROM TMP_XF_TEST AS t1 JOIN

(SELECT ROUND(RAND() * ((SELECT MAX(id) FROM TMP_XF_TEST WHERE GID = 9)-(SELECT MIN(id) FROM TMP_XF_TEST WHERE GID = 9))

+(SELECT MIN(id) FROM TMP_XF_TEST WHERE GID = 9)) AS id) AS t2

WHERE t1.id >= t2.id AND t1.GID = 9

ORDER BY t1.id LIMIT 1

#########

不要用下面的杯具写法

mysql>insert into tmp_xf_test(user_nick,gid,item_id,gmt_create,gmt_modified,memo)

->select user_nick,gid,item_id,gmt_create,gmt_modified,memo from tmp_xf_test

Query OK, 165888 rows affected (9.65 sec)

Records: 165888 Duplicates: 0 Warnings: 0

mysql>SELECT *

->FROM `tmp_xf_test`

->WHERE id >= (SELECT FLOOR( MAX(id) * RAND()) FROM `tmp_xf_test` )

->ORDER BY id LIMIT 1

+-----+-----------+-----+---------+---------------------+---------------------+--------------------+

| id | user_nick | gid | item_id | gmt_create | gmt_modified| memo |

+-----+-----------+-----+---------+---------------------+---------------------+--------------------+

| 467 | 玄风 | 9 | 123 | 2010-04-26 14:56:39 | 2010-04-26 14:56:39 | 玄风测试使用的数据 |

+-----+-----------+-----+---------+---------------------+---------------------+--------------------+

1 row in set (51.12 sec)

mysql>explain SELECT *

->FROM `tmp_xf_test`

->WHERE id >= (SELECT FLOOR( MAX(id) * RAND()) FROM `tmp_xf_test` )

->ORDER BY id LIMIT 1\G

*************************** 1. row ***************************

id: 1

select_type: PRIMARY

table: tmp_xf_test

type: index

possible_keys: NULL

key: PRIMARY

key_len: 8

ref: NULL

rows: 1

Extra: Using where

*************************** 2. row ***************************

id: 2

select_type: UNCACHEABLE SUBQUERY

table: tmp_xf_test

type: index

possible_keys: NULL

key: idx_tmp_xf_test_gid

key_len: 4

ref: NULL

rows: 331954

Extra: Using index

2 rows in set (0.01 sec)

---

mysql>SELECT * FROM `tmp_xf_test` t1 join

->(SELECT FLOOR( MAX(id) * RAND()) as id FROM `tmp_xf_test` ) as t2

->where t1.id >=t2.id

->ORDER BY t1.id LIMIT 1

+-------+-----------+-----+---------+---------------------+---------------------+--------------------+-------+

| id| user_nick | gid | item_id | gmt_create | gmt_modified| memo | id|

+-------+-----------+-----+---------+---------------------+---------------------+--------------------+-------+

| 40311 | 玄风 | 9 | 123 | 2010-04-28 15:47:19 | 2010-04-28 15:47:19 | 玄风测试使用的数据 | 40311 |

+-------+-----------+-----+---------+---------------------+---------------------+--------------------+-------+

1 row in set (0.14 sec)

##############

mysql>SELECT * FROM `tmp_xf_test`

->WHERE id >= (SELECT floor(RAND() * (SELECT MAX(id) FROM `tmp_xf_test`)))

->ORDER BY id LIMIT 1

+------+-----------+-----+---------+---------------------+---------------------+--------------------+

| id | user_nick | gid | item_id | gmt_create | gmt_modified| memo |

+------+-----------+-----+---------+---------------------+---------------------+--------------------+

| 1352 | 玄风 | 9 | 123 | 2010-04-28 15:47:19 | 2010-04-28 15:47:19 | 玄风测试使用的数据 |

+------+-----------+-----+---------+---------------------+---------------------+--------------------+

1 row in set (0.00 sec)

mysql>explain SELECT * FROM `tmp_xf_test`

->WHERE id >= (SELECT floor(RAND() * (SELECT MAX(id) FROM `tmp_xf_test`)))

->ORDER BY id LIMIT 1\G

*************************** 1. row ***************************

id: 1

select_type: PRIMARY

table: tmp_xf_test

type: index

possible_keys: NULL

key: PRIMARY

key_len: 8

ref: NULL

rows: 1

Extra: Using where

*************************** 2. row ***************************

id: 3

select_type: SUBQUERY

table: NULL

type: NULL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: NULL

Extra: Select tables optimized away

2 rows in set, 1 warning (0.00 sec)

对应的另外一种杯具写法是:

SELECT *

FROM TMP_XF_TEST

WHERE ID >= (SELECT FLOOR(RAND() * (MAX(ID) - MIN(ID))) + MIN(ID) MID

FROM TMP_XF_TEST

WHERE GID = 9)

AND GID = 9 LIMIT 1


欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/zaji/7272624.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2023-04-03
下一篇 2023-04-03

发表评论

登录后才能评论

评论列表(0条)

保存