i am using Pyspark 3.4.1, java 8, hadoop 3.4.0, scala 2.12.17, python 3.11.4 and this is my code in vscode :
def calculating_click(df): click_data = df.filter((df.custom_track == "click")) click_data = click_data.na.fill({'bid':0}) click_data = click_data.na.fill({'job_id':0}) click_data = click_data.na.fill({'publisher_id':0}) click_data = click_data.na.fill({'group_id':0}) click_data = click_data.na.fill({'campaign_id':0}) click_data = click_data.na.fill({'campaign_id':0}) click_data.registerTempTable('clicks') #name temporary table 'clicks' click_output = spark.sql("""select job_id,date(ts) as date,hour(ts) as hour,publisher_id,campaign_id,group_id, avg(bid) as bid_set,count(*) as clicks, sum(bid) as spend_hour from clicks`group by job_id, date(ts), hour(ts),publisher_id, campaign_id, group_id """)` I got this error :
Py4JError: An error occurred while calling o28.sql. Trace: py4j.Py4JException: Method sql([class java.lang.String, class [Ljava.lang.Object;]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:321) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:329) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.lang.Thread.run(Thread.java:750)



Could anyone help me fix this ? I'm trying using pyspark but got error every time, which version of spark,hadoop,java should i use ?
Источник: https://stackoverflow.com/questions/774 ... -ljava-lan