通过将经度和纬度转换为笛卡尔坐标,即使将点分散在整个地球上,以下解决方案也可以使用。它执行
一种KDE(内核密度估计),但在第一步中,
仅在数据点评估内核总和。应该选择内核以
适合该问题。在下面的代码中,这是我可以开玩笑/自以为是
的Trossian格式,即d≤h为2-d²/h²,d> h为h²/d²(其中d是
欧几里得距离,h是“带宽” $global_kernel_radius) ,
也可以是高斯(e-d²/2h²),Epanechnikov内核(d <h则为1-d²/h²,
否则为0)或其他内核。可选的第二遍优化搜索
在这两种情况下 ,都可以通过在本地网格上求和一个独立的内核,或者通过计算质心来本地化
$local_grid_radius。
本质上,每个点将其周围的所有点(包括
自身)相加,如果它们更接近则对其进行加权(通过钟形曲线),并通过可选的权重数组对其进行加权$w_arr。获胜者是总金额最大的点。找到获胜者后
,可以通过在获胜者周围局部重复相同的过程(使用另一个钟形曲线)来找到我们要寻找的“家” ,或者可以将其估计为所有点的“ 质心”在距获胜者的给定半径内,该半径可以为零。
必须通过选择适当的内核,选择如何在本地优化搜索并调整参数来使算法适应该问题。对于示例数据集,用于第一遍的Trossian内核和Epanechnikov内核第二遍中,与所有3半径设定为30毫升和1的网格步MI可以是起点一个良好的,但只有在两个子芝加哥集群应被视为一个大型集群。否则,必须选择较小的半径。
function find_home($lat_arr, $lng_arr, $global_kernel_radius, $local_kernel_radius, $local_grid_radius, // 0 for no 2nd pass $local_grid_step, // 0 for centroid $units='mi', $w_arr=null){ // for lat,lng <-> x,y,z see http://en.wikipedia.org/wiki/Geodetic_datum // for K and h see http://en.wikipedia.org/wiki/Kernel_density_estimation switch (strtolower($units)) { case 'nm' : case 'nmi': $m_divisor = 1852; break;case 'mi': $m_divisor = 1609.344; break;case 'km': $m_divisor = 1000; break;case 'm': $m_divisor = 1; break;default: return false; } $a = 6378137 / $m_divisor; // Earth semi-major axis (WGS84) $e2 = 6.69437999014E-3; // First eccentricity squared (WGS84) $lat_lng_count = count($lat_arr); if ( !$w_arr) { $w_arr = array_fill(0, $lat_lng_count, 1.0); } $x_arr = array(); $y_arr = array(); $z_arr = array(); $rad = M_PI / 180; $one_e2 = 1 - $e2; for ($i = 0; $i < $lat_lng_count; $i++) { $lat = $lat_arr[$i]; $lng = $lng_arr[$i]; $sin_lat = sin($lat * $rad); $sin_lng = sin($lng * $rad); $cos_lat = cos($lat * $rad); $cos_lng = cos($lng * $rad); // height = 0 (!) $N = $a / sqrt(1 - $e2 * $sin_lat * $sin_lat); $x_arr[$i] = $N * $cos_lat * $cos_lng; $y_arr[$i] = $N * $cos_lat * $sin_lng; $z_arr[$i] = $N * $one_e2 * $sin_lat; } $h = $global_kernel_radius; $h2 = $h * $h; $max_K_sum = -1; $max_K_sum_idx = -1; for ($i = 0; $i < $lat_lng_count; $i++) { $xi = $x_arr[$i]; $yi = $y_arr[$i]; $zi = $z_arr[$i]; $K_sum = 0; for ($j = 0; $j < $lat_lng_count; $j++) { $dx = $xi - $x_arr[$j]; $dy = $yi - $y_arr[$j]; $dz = $zi - $z_arr[$j]; $d2 = $dx * $dx + $dy * $dy + $dz * $dz; $K_sum += $w_arr[$j] * ($d2 <= $h2 ? (2 - $d2 / $h2) : $h2 / $d2); // Trossian ;-) // $K_sum += $w_arr[$j] * exp(-0.5 * $d2 / $h2); // Gaussian } if ($max_K_sum < $K_sum) { $max_K_sum = $K_sum; $max_K_sum_i = $i; } } $winner_x = $x_arr [$max_K_sum_i]; $winner_y = $y_arr [$max_K_sum_i]; $winner_z = $z_arr [$max_K_sum_i]; $winner_lat = $lat_arr[$max_K_sum_i]; $winner_lng = $lng_arr[$max_K_sum_i]; $sin_winner_lat = sin($winner_lat * $rad); $cos_winner_lat = cos($winner_lat * $rad); $sin_winner_lng = sin($winner_lng * $rad); $cos_winner_lng = cos($winner_lng * $rad); $east_x = -$local_grid_step * $sin_winner_lng; $east_y = $local_grid_step * $cos_winner_lng; $east_z = 0; $north_x = -$local_grid_step * $sin_winner_lat * $cos_winner_lng; $north_y = -$local_grid_step * $sin_winner_lat * $sin_winner_lng; $north_z = $local_grid_step * $cos_winner_lat; if ($local_grid_radius > 0 && $local_grid_step > 0) { $r = intval($local_grid_radius / $local_grid_step); $r2 = $r * $r; $h = $local_kernel_radius; $h2 = $h * $h; $max_L_sum = -1; $max_L_sum_idx = -1; for ($i = -$r; $i <= $r; $i++) { $winner_east_x = $winner_x + $i * $east_x; $winner_east_y = $winner_y + $i * $east_y; $winner_east_z = $winner_z + $i * $east_z; $j_max = intval(sqrt($r2 - $i * $i)); for ($j = -$j_max; $j <= $j_max; $j++) { $x = $winner_east_x + $j * $north_x; $y = $winner_east_y + $j * $north_y; $z = $winner_east_z + $j * $north_z; $L_sum = 0; for ($k = 0; $k < $lat_lng_count; $k++) { $dx = $x - $x_arr[$k]; $dy = $y - $y_arr[$k]; $dz = $z - $z_arr[$k]; $d2 = $dx * $dx + $dy * $dy + $dz * $dz; if ($d2 < $h2) { $L_sum += $w_arr[$k] * ($h2 - $d2); // Epanechnikov } } if ($max_L_sum < $L_sum) { $max_L_sum = $L_sum; $max_L_sum_i = $i; $max_L_sum_j = $j; } } } $x = $winner_x + $max_L_sum_i * $east_x + $max_L_sum_j * $north_x; $y = $winner_y + $max_L_sum_i * $east_y + $max_L_sum_j * $north_y; $z = $winner_z + $max_L_sum_i * $east_z + $max_L_sum_j * $north_z; } else if ($local_grid_radius > 0) { $r = $local_grid_radius; $r2 = $r * $r; $wx_sum = 0; $wy_sum = 0; $wz_sum = 0; $w_sum = 0; for ($k = 0; $k < $lat_lng_count; $k++) { $xk = $x_arr[$k]; $yk = $y_arr[$k]; $zk = $z_arr[$k]; $dx = $winner_x - $xk; $dy = $winner_y - $yk; $dz = $winner_z - $zk; $d2 = $dx * $dx + $dy * $dy + $dz * $dz; if ($d2 <= $r2) { $wk = $w_arr[$k]; $wx_sum += $wk * $xk; $wy_sum += $wk * $yk; $wz_sum += $wk * $zk; $w_sum += $wk; } } $x = $wx_sum / $w_sum; $y = $wy_sum / $w_sum; $z = $wz_sum / $w_sum; $max_L_sum_i = false; $max_L_sum_j = false; } else { return array($winner_lat, $winner_lng, $max_K_sum_i, false, false); } $deg = 180 / M_PI; $a2 = $a * $a; $e4 = $e2 * $e2; $p = sqrt($x * $x + $y * $y); $zeta = (1 - $e2) * $z * $z / $a2; $rho = ($p * $p / $a2 + $zeta - $e4) / 6; $rho3 = $rho * $rho * $rho; $s = $e4 * $zeta * $p * $p / (4 * $a2); $t = pow($s + $rho3 + sqrt($s * ($s + 2 * $rho3)), 1 / 3); $u = $rho + $t + $rho * $rho / $t; $v = sqrt($u * $u + $e4 * $zeta); $w = $e2 * ($u + $v - $zeta) / (2 * $v); $k = 1 + $e2 * (sqrt($u + $v + $w * $w) + $w) / ($u + $v); $lat = atan($k * $z / $p) * $deg; $lng = atan2($y, $x) * $deg; return array($lat, $lng, $max_K_sum_i, $max_L_sum_i, $max_L_sum_j);}
距离是欧几里得而不是大圆这一事实对于手头的任务应该忽略不计。计算大圆距离会很麻烦,并且只会使非常远的点的权重大大降低-但是这些点的权重已经非常低。原则上,不同的内核可以达到相同的效果。
具有一定截止距离的完整核(例如Epanechnikov内核)根本没有这个问题(实际上)。
WGS84基准面的lat,lng和x,y,z之间的转换比真实的需要更准确(尽管不能保证数值稳定性)作为参考。如果要考虑高度,或者需要更快的反向转换.
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)