Spark 之 unsafeRow

Spark 之 unsafeRow,第1张

UnsafeRow

UnsafeRow is an InternalRow that is backed by raw memory instead of Java objects.
UnSafeRow has three parts: [null bit set] [values] [variable length portion]

  • 64bit(8byte)对齐,内存空间不紧凑但有利于提高访存性能
  • 小端存储,这样低位类型存到高位内存(如存int到64位)不需要额外编码
  • 所有列不管什么类型都按64bit存储,变长内容顺延存储
Array in UnsafeRow
/**
 * An Unsafe implementation of Array which is backed by raw memory instead of Java objects.
 *
 * Each array has four parts:
 *   [numElements][null bits][values or offset&length][variable length portion]
 *
 * The `numElements` is 8 bytes storing the number of elements of this array.
 *
 * In the `null bits` region, we store 1 bit per element, represents whether an element is null
 * Its total size is ceil(numElements / 8) bytes, and it is aligned to 8-byte boundaries.
 *
 * In the `values or offset&length` region, we store the content of elements. For fields that hold
 * fixed-length primitive types, such as long, double, or int, we store the value directly
 * in the field. The whole fixed-length portion (even for byte) is aligned to 8-byte boundaries.
 * For fields with non-primitive or variable-length values, we store a relative offset
 * (w.r.t. the base address of the array) that points to the beginning of the variable-length field
 * and length (they are combined into a long). For variable length portion, each is aligned
 * to 8-byte boundaries.
 *
 * Instances of `UnsafeArrayData` act as pointers to row data stored in this format.
 */

欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/langs/719988.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-04-25
下一篇 2022-04-25

发表评论

登录后才能评论

评论列表(0条)

保存