Friday, September 4, 2015

急,跪求答案 (moving avg using spark dataframe window functions)

http://www.mitbbs.com/article_t/DataSciences/17933.html

发信人: ww62 (熊孩子), 信区: DataSciences
标  题: 急,跪求答案 (moving avg using spark dataframe window functions)
发信站: BBS 未名空间站 (Fri Sep  4 19:02:44 2015, 美东)

请教大牛们,如何用window functions来算出 3day moving avg。那个error msg看不
懂呢,为啥要hive context。
多谢了~


例子如下,

from pyspark.sql import Window
from pyspark.sql import SQLContext
import pyspark.sql.functions as func

Table T:

Date       Num
07/01       2
07/02       3
07/03       2
07/04       2
07/05       5
07/06       6
07/07       7

sqlCtx = SQLContext(sc)

T.registerTempTable(“T”)

w = Window.partitionBy(T.Date).orderBy(T.Date).rangeBetween(-2,0)

a = (func.avg(T["Num"]).over(w))

T.select(T["Date"],T["Num"],a.alias("moving_avg"))

Error Msg:
Could not resolve window function 'avg'. Note that, using window functions
currently requires a HiveContext;

No comments:

Post a Comment