Spark DataFrame TimestampType-如何从字段获取年,月,日值?

Spark DataFrame TimestampType-如何从字段获取年,月,日值?,第1张

Spark DataFrame TimestampType-如何从字段获取年,月,日值?

从Spark 1.5开始,您可以使用许多日期处理功能:

  • pyspark.sql.functions.year
  • pyspark.sql.functions.month
  • pyspark.sql.functions.dayofmonth
  • pyspark.sql.functions.dayofweek()
  • pyspark.sql.functions.dayofyear
  • pyspark.sql.functions.weekofyear()

    import datetime
    from pyspark.sql.functions import year, month, dayofmonth

    elevDF = sc.parallelize([
    (datetime.datetime(1984, 1, 1, 0, 0), 1, 638.55),
    (datetime.datetime(1984, 1, 1, 0, 0), 2, 638.55),
    (datetime.datetime(1984, 1, 1, 0, 0), 3, 638.55),
    (datetime.datetime(1984, 1, 1, 0, 0), 4, 638.55),
    (datetime.datetime(1984, 1, 1, 0, 0), 5, 638.55)
    ]).toDF([“date”, “hour”, “value”])

    elevDF.select(
    year(“date”).alias(‘year’),
    month(“date”).alias(‘month’),
    dayofmonth(“date”).alias(‘day’)
    ).show()

    +----+-----+—+|year|month|day|+----+-----+—+|1984| 1| 1||1984| 1| 1||1984| 1| 1||1984| 1| 1||1984| 1| 1|+----+-----+—+

您可以将simple

map
与其他任何RDD一起使用:

elevDF = sqlContext.createDataframe(sc.parallelize([        Row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=1, value=638.55),        Row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=2, value=638.55),        Row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=3, value=638.55),        Row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=4, value=638.55),        Row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=5, value=638.55)]))(elevDF .map(lambda (date, hour, value): (date.year, date.month, date.day)) .collect())

结果是:

[(1984, 1, 1), (1984, 1, 1), (1984, 1, 1), (1984, 1, 1), (1984, 1, 1)]

顺便说一句:

datetime.datetime
无论如何都存储一个小时,所以分开保存似乎浪费了内存。



欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/zaji/5652655.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-16
下一篇 2022-12-16

发表评论

登录后才能评论

评论列表(0条)

保存