pyspark.sql.functions.max_by¶

pyspark.sql.functions.max_by(col: ColumnOrName, ord: ColumnOrName) → pyspark.sql.column.Column[source]¶

Returns the value associated with the maximum value of ord.

New in version 3.3.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters

colColumn or str: target column to compute on.
ordColumn or str: column to be maximized

Returns

Column: value associated with the maximum value of ord.

Examples

>>> df = spark.createDataFrame([
...     ("Java", 2012, 20000), ("dotNET", 2012, 5000),
...     ("dotNET", 2013, 48000), ("Java", 2013, 30000)],
...     schema=("course", "year", "earnings"))
>>> df.groupby("course").agg(max_by("year", "earnings")).show()
+------+----------------------+
|course|max_by(year, earnings)|
+------+----------------------+
|  Java|                  2013|
|dotNET|                  2013|
+------+----------------------+

pyspark.sql.functions.max pyspark.sql.functions.mean