spark dataframe - Pyspark - WARN BisectingKMeans: The input RDD is not directly cached -
i'm running bisecting kmeans as
bkm_test=bisectingkmeans().setk(5).setseed(1) rdf.cache() assembled.cache() model_test=bkm_test.fit(assembled)
i cached 2 dataframes keep getting error, doesn't make difference, found question similar kmeans. warn executor error below. inside algorithm can't fix?
17/08/14 21:53:17 warn bisectingkmeans: input rdd 306 not directly cached, may hurt performance if parent rdds not cached. 17/08/14 21:53:17 warn executor: 1 block locks not released tid = 132: [rdd_302_0]
Comments
Post a Comment