У меня есть приведенные ниже наборы данных в Spark, где мне нужно выполнить условие соединения.
Код: Выделить всё
+---+------+-------+--------+------------------------+----+--------+
|ID |PState|MState |dt |TS |proc|daterank|
+---+------+-------+--------+------------------------+----+--------+
|1 |Iowa |Kansas |20240212|null |2 |1 |
|1 |Iowa |CA |20250212|null |4 |2 |
|2 |Maine |Chicago|20240212|2024-01-01T00:00:00.000Z|2 |1 |
|2 |NJ |NY |20250212|null |4 |2 |
|3 |CA |MS |20240212|null |2 |1 |
|3 |NJ |NY |20240212|null |4 |1 |
|3 |NJ |WV |20240212|null |9 |1 |
+---+------+-------+--------+------------------------+----+--------+
+---+------+------+--------+----+-----+---------+
|bID|PState|MState|bdt |bTS |bproc|bdaterank|
+---+------+------+--------+----+-----+---------+
|1 |Iowa |CA |20250212|null|4 |2 |
|2 |NJ |NY |20250212|null|4 |2 |
|3 |CA |MS |20240212|null|2 |1 |
|3 |NJ |NY |20240212|null|4 |1 |
|3 |NJ |WV |20240212|null|9 |1 |
+---+------+------+--------+----+-----+---------+
Код: Выделить всё
+---+------+-------+--------+------------------------+----+--------+
|ID |PState|MState |dt |TS |proc|daterank|
+---+------+-------+--------+------------------------+----+--------+
|1 |Iowa |CA |20240212|null |2 |1 |
|1 |Iowa |CA |20250212|null |4 |2 |
|2 |Maine |Chicago|20240212|2024-01-01T00:00:00.000Z|2 |1 |
|2 |Maine |Chicago|20250212|null |4 |2 |
|3 |CA |MS |20240212|null |2 |1 |
|3 |NJ |NY |20240212|null |4 |1 |
|3 |NJ |WV |20240212|null |9 |1 |
+---+------+-------+--------+------------------------+----+--------
Код: Выделить всё
filteredDfDate.withColumn("maxPoints", max("maxPoints").over(Window.partitionBy("bdaterank","bID","maxPoints")))
.withColumn("enrichedFinalPState",when(col("bdaterank").notEqual(col("maxPoints")).and(col("enrichedPState").isNull()),max(col("enrichedPState"))).otherwise(lit("null")))
//.where(col("daterank").equalTo(col("maxPoints"))).withColumnRenamed("PState","enrichedPState")
.withColumn("enrichedFinalMState",when(col("bdaterank").notEqual(col("maxPoints")).and(col("enrichedMState").isNull()),max(col("enrichedMState"))).otherwise(lit("null")))
.show(false);
Can someone please help me
Can someone please help me
Источник: https://stackoverflow.com/questions/781 ... rk-dataset