Class KeyValueGroupedDataset<K,V>
- All Implemented Interfaces:
Serializable
Dataset has been logically grouped by a user specified grouping key. Users should not
construct a KeyValueGroupedDataset directly, but should instead call groupByKey on
an existing Dataset.
- Since:
- 2.0.0
- See Also:
-
Method Summary
Modifier and TypeMethodDescriptionagg(TypedColumn<V, U1> col1) Computes the given aggregation, returning aDatasetof tuples for each unique key and the result of computing this aggregation over all elements in the group.agg(TypedColumn<V, U1> col1, TypedColumn<V, U2> col2) Computes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.agg(TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3) Computes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.agg(TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4) Computes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.agg(TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4, TypedColumn<V, U5> col5) Computes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.agg(TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4, TypedColumn<V, U5> col5, TypedColumn<V, U6> col6) Computes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.agg(TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4, TypedColumn<V, U5> col5, TypedColumn<V, U6> col6, TypedColumn<V, U7> col7) Computes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.agg(TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4, TypedColumn<V, U5> col5, TypedColumn<V, U6> col6, TypedColumn<V, U7> col7, TypedColumn<V, U8> col8) Computes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.<U,R> Dataset<R> cogroup(KeyValueGroupedDataset<K, U> other, CoGroupFunction<K, V, U, R> f, Encoder<R> encoder) <U,R> Dataset<R> cogroup(KeyValueGroupedDataset<K, U> other, scala.Function3<K, scala.collection.Iterator<V>, scala.collection.Iterator<U>, scala.collection.IterableOnce<R>> f, Encoder<R> evidence$30) <U,R> Dataset<R> cogroupSorted(KeyValueGroupedDataset<K, U> other, Column[] thisSortExprs, Column[] otherSortExprs, CoGroupFunction<K, V, U, R> f, Encoder<R> encoder) <U,R> Dataset<R> cogroupSorted(KeyValueGroupedDataset<K, U> other, scala.collection.immutable.Seq<Column> thisSortExprs, scala.collection.immutable.Seq<Column> otherSortExprs, scala.Function3<K, scala.collection.Iterator<V>, scala.collection.Iterator<U>, scala.collection.IterableOnce<R>> f, Encoder<R> evidence$21) count()Returns aDatasetthat contains a tuple with each key and the number of items present for that key.<U> Dataset<U>flatMapGroups(FlatMapGroupsFunction<K, V, U> f, Encoder<U> encoder) (Java-specific) Applies the given function to each group of data.<U> Dataset<U>flatMapGroups(scala.Function2<K, scala.collection.Iterator<V>, scala.collection.IterableOnce<U>> f, Encoder<U> evidence$22) (Scala-specific) Applies the given function to each group of data.<S,U> Dataset<U> flatMapGroupsWithState(FlatMapGroupsWithStateFunction<K, V, S, U> func, OutputMode outputMode, Encoder<S> stateEncoder, Encoder<U> outputEncoder, GroupStateTimeout timeoutConf) (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.<S,U> Dataset<U> flatMapGroupsWithState(FlatMapGroupsWithStateFunction<K, V, S, U> func, OutputMode outputMode, Encoder<S> stateEncoder, Encoder<U> outputEncoder, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K, S> initialState) <S,U> Dataset<U> flatMapGroupsWithState(OutputMode outputMode, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K, S> initialState, scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, scala.collection.Iterator<U>> func, Encoder<S> evidence$12, Encoder<U> evidence$13) <S,U> Dataset<U> flatMapGroupsWithState(OutputMode outputMode, GroupStateTimeout timeoutConf, scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, scala.collection.Iterator<U>> func, Encoder<S> evidence$10, Encoder<U> evidence$11) (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.<U> Dataset<U>flatMapSortedGroups(Column[] SortExprs, FlatMapGroupsFunction<K, V, U> f, Encoder<U> encoder) (Java-specific) Applies the given function to each group of data.<U> Dataset<U>flatMapSortedGroups(scala.collection.immutable.Seq<Column> sortExprs, scala.Function2<K, scala.collection.Iterator<V>, scala.collection.IterableOnce<U>> f, Encoder<U> evidence$3) (Scala-specific) Applies the given function to each group of data.<L> KeyValueGroupedDataset<L,V> Returns a newKeyValueGroupedDatasetwhere the type of the key has been mapped to the specified type.keys()Returns aDatasetthat contains each unique key.<U> Dataset<U>mapGroups(MapGroupsFunction<K, V, U> f, Encoder<U> encoder) (Java-specific) Applies the given function to each group of data.<U> Dataset<U>(Scala-specific) Applies the given function to each group of data.<S,U> Dataset<U> mapGroupsWithState(MapGroupsWithStateFunction<K, V, S, U> func, Encoder<S> stateEncoder, Encoder<U> outputEncoder) (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.<S,U> Dataset<U> mapGroupsWithState(MapGroupsWithStateFunction<K, V, S, U> func, Encoder<S> stateEncoder, Encoder<U> outputEncoder, GroupStateTimeout timeoutConf) (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.<S,U> Dataset<U> mapGroupsWithState(MapGroupsWithStateFunction<K, V, S, U> func, Encoder<S> stateEncoder, Encoder<U> outputEncoder, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K, S> initialState) <S,U> Dataset<U> mapGroupsWithState(GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K, S> initialState, scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, U> func, Encoder<S> evidence$8, Encoder<U> evidence$9) <S,U> Dataset<U> mapGroupsWithState(GroupStateTimeout timeoutConf, scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, U> func, Encoder<S> evidence$6, Encoder<U> evidence$7) (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.<S,U> Dataset<U> mapGroupsWithState(scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, U> func, Encoder<S> evidence$4, Encoder<U> evidence$5) (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.<W> KeyValueGroupedDataset<K,W> mapValues(MapFunction<V, W> func, Encoder<W> encoder) Returns a newKeyValueGroupedDatasetwhere the given functionfunchas been applied to the data.<W> KeyValueGroupedDataset<K,W> Returns a newKeyValueGroupedDatasetwhere the given functionfunchas been applied to the data.org.apache.spark.sql.execution.QueryExecution(Java-specific) Reduces the elements of each group of data using the specified binary function.reduceGroups(scala.Function2<V, V, V> f) (Scala-specific) Reduces the elements of each group of data using the specified binary function.toString()Methods inherited from class org.apache.spark.sql.api.KeyValueGroupedDataset
cogroup, cogroup, cogroupSorted, cogroupSorted, flatMapGroupsWithState, flatMapGroupsWithState, mapGroupsWithState, mapGroupsWithState
-
Method Details
-
agg
Description copied from class:KeyValueGroupedDatasetComputes the given aggregation, returning aDatasetof tuples for each unique key and the result of computing this aggregation over all elements in the group.- Overrides:
aggin classKeyValueGroupedDataset<K,V, Dataset> - Parameters:
col1- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
agg
Description copied from class:KeyValueGroupedDatasetComputes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.- Overrides:
aggin classKeyValueGroupedDataset<K,V, Dataset> - Parameters:
col1- (undocumented)col2- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
agg
public <U1,U2, Dataset<scala.Tuple4<K,U3> U1, aggU2, U3>> (TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3) Description copied from class:KeyValueGroupedDatasetComputes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.- Overrides:
aggin classKeyValueGroupedDataset<K,V, Dataset> - Parameters:
col1- (undocumented)col2- (undocumented)col3- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
agg
public <U1,U2, Dataset<scala.Tuple5<K,U3, U4> U1, aggU2, U3, U4>> (TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4) Description copied from class:KeyValueGroupedDatasetComputes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.- Overrides:
aggin classKeyValueGroupedDataset<K,V, Dataset> - Parameters:
col1- (undocumented)col2- (undocumented)col3- (undocumented)col4- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
agg
public <U1,U2, Dataset<scala.Tuple6<K,U3, U4, U5> U1, aggU2, U3, U4, U5>> (TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4, TypedColumn<V, U5> col5) Description copied from class:KeyValueGroupedDatasetComputes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.- Overrides:
aggin classKeyValueGroupedDataset<K,V, Dataset> - Parameters:
col1- (undocumented)col2- (undocumented)col3- (undocumented)col4- (undocumented)col5- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
agg
public <U1,U2, Dataset<scala.Tuple7<K,U3, U4, U5, U6> U1, aggU2, U3, U4, U5, U6>> (TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4, TypedColumn<V, U5> col5, TypedColumn<V, U6> col6) Description copied from class:KeyValueGroupedDatasetComputes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.- Overrides:
aggin classKeyValueGroupedDataset<K,V, Dataset> - Parameters:
col1- (undocumented)col2- (undocumented)col3- (undocumented)col4- (undocumented)col5- (undocumented)col6- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
agg
public <U1,U2, Dataset<scala.Tuple8<K,U3, U4, U5, U6, U7> U1, aggU2, U3, U4, U5, U6, U7>> (TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4, TypedColumn<V, U5> col5, TypedColumn<V, U6> col6, TypedColumn<V, U7> col7) Description copied from class:KeyValueGroupedDatasetComputes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.- Overrides:
aggin classKeyValueGroupedDataset<K,V, Dataset> - Parameters:
col1- (undocumented)col2- (undocumented)col3- (undocumented)col4- (undocumented)col5- (undocumented)col6- (undocumented)col7- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
agg
public <U1,U2, Dataset<scala.Tuple9<K,U3, U4, U5, U6, U7, U8> U1, aggU2, U3, U4, U5, U6, U7, U8>> (TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4, TypedColumn<V, U5> col5, TypedColumn<V, U6> col6, TypedColumn<V, U7> col7, TypedColumn<V, U8> col8) Description copied from class:KeyValueGroupedDatasetComputes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.- Overrides:
aggin classKeyValueGroupedDataset<K,V, Dataset> - Parameters:
col1- (undocumented)col2- (undocumented)col3- (undocumented)col4- (undocumented)col5- (undocumented)col6- (undocumented)col7- (undocumented)col8- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
cogroup
public <U,R> Dataset<R> cogroup(KeyValueGroupedDataset<K, U> other, scala.Function3<K, scala.collection.Iterator<V>, scala.collection.Iterator<U>, scala.collection.IterableOnce<R>> f, Encoder<R> evidence$30) - Inheritdoc:
-
cogroup
public <U,R> Dataset<R> cogroup(KeyValueGroupedDataset<K, U> other, CoGroupFunction<K, V, U, R> f, Encoder<R> encoder) - Inheritdoc:
-
cogroupSorted
public <U,R> Dataset<R> cogroupSorted(KeyValueGroupedDataset<K, U> other, scala.collection.immutable.Seq<Column> thisSortExprs, scala.collection.immutable.Seq<Column> otherSortExprs, scala.Function3<K, scala.collection.Iterator<V>, scala.collection.Iterator<U>, scala.collection.IterableOnce<R>> f, Encoder<R> evidence$21) - Inheritdoc:
-
cogroupSorted
public <U,R> Dataset<R> cogroupSorted(KeyValueGroupedDataset<K, U> other, Column[] thisSortExprs, Column[] otherSortExprs, CoGroupFunction<K, V, U, R> f, Encoder<R> encoder) - Inheritdoc:
-
count
Description copied from class:KeyValueGroupedDatasetReturns aDatasetthat contains a tuple with each key and the number of items present for that key.- Overrides:
countin classKeyValueGroupedDataset<K,V, Dataset> - Returns:
- (undocumented)
- Inheritdoc:
-
flatMapGroups
public <U> Dataset<U> flatMapGroups(scala.Function2<K, scala.collection.Iterator<V>, scala.collection.IterableOnce<U>> f, Encoder<U> evidence$22) Description copied from class:KeyValueGroupedDataset(Scala-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a newDataset.This function does not support partial aggregation, and as a result requires shuffling all the data in the
Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or anorg.apache.spark.sql.expressions#Aggregator.Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling
toList) unless they are sure that this is possible given the memory constraints of their cluster.- Overrides:
flatMapGroupsin classKeyValueGroupedDataset<K,V, Dataset> - Parameters:
f- (undocumented)evidence$22- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
flatMapGroups
Description copied from class:KeyValueGroupedDataset(Java-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a newDataset.This function does not support partial aggregation, and as a result requires shuffling all the data in the
Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or anorg.apache.spark.sql.expressions#Aggregator.Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling
toList) unless they are sure that this is possible given the memory constraints of their cluster.- Overrides:
flatMapGroupsin classKeyValueGroupedDataset<K,V, Dataset> - Parameters:
f- (undocumented)encoder- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
flatMapGroupsWithState
public <S,U> Dataset<U> flatMapGroupsWithState(OutputMode outputMode, GroupStateTimeout timeoutConf, scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, scala.collection.Iterator<U>> func, Encoder<S> evidence$10, Encoder<U> evidence$11) Description copied from class:KeyValueGroupedDataset(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. SeeGroupStatefor more details.- Specified by:
flatMapGroupsWithStatein classKeyValueGroupedDataset<K,V, Dataset> - Parameters:
outputMode- The output mode of the function.timeoutConf- Timeout configuration for groups that do not receive data for a while.See
Encoderfor more details on what types are encodable to Spark SQL.func- Function to be called on every group.evidence$10- (undocumented)evidence$11- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
flatMapGroupsWithState
public <S,U> Dataset<U> flatMapGroupsWithState(OutputMode outputMode, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K, S> initialState, scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, scala.collection.Iterator<U>> func, Encoder<S> evidence$12, Encoder<U> evidence$13) - Inheritdoc:
-
flatMapGroupsWithState
public <S,U> Dataset<U> flatMapGroupsWithState(FlatMapGroupsWithStateFunction<K, V, S, U> func, OutputMode outputMode, Encoder<S> stateEncoder, Encoder<U> outputEncoder, GroupStateTimeout timeoutConf) Description copied from class:KeyValueGroupedDataset(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. SeeGroupStatefor more details.- Overrides:
flatMapGroupsWithStatein classKeyValueGroupedDataset<K,V, Dataset> - Parameters:
func- Function to be called on every group.outputMode- The output mode of the function.stateEncoder- Encoder for the state type.outputEncoder- Encoder for the output type.timeoutConf- Timeout configuration for groups that do not receive data for a while.See
Encoderfor more details on what types are encodable to Spark SQL.- Returns:
- (undocumented)
- Inheritdoc:
-
flatMapGroupsWithState
public <S,U> Dataset<U> flatMapGroupsWithState(FlatMapGroupsWithStateFunction<K, V, S, U> func, OutputMode outputMode, Encoder<S> stateEncoder, Encoder<U> outputEncoder, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K, S> initialState) - Inheritdoc:
-
flatMapSortedGroups
public <U> Dataset<U> flatMapSortedGroups(scala.collection.immutable.Seq<Column> sortExprs, scala.Function2<K, scala.collection.Iterator<V>, scala.collection.IterableOnce<U>> f, Encoder<U> evidence$3) Description copied from class:KeyValueGroupedDataset(Scala-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and a sorted iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a newDataset.This function does not support partial aggregation, and as a result requires shuffling all the data in the
Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or anorg.apache.spark.sql.expressions#Aggregator.Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling
toList) unless they are sure that this is possible given the memory constraints of their cluster.This is equivalent to
KeyValueGroupedDataset.flatMapGroups(scala.Function2<K, scala.collection.Iterator<V>, scala.collection.IterableOnce<U>>, org.apache.spark.sql.Encoder<U>), except for the iterator to be sorted according to the given sort expressions. That sorting does not add computational complexity.- Specified by:
flatMapSortedGroupsin classKeyValueGroupedDataset<K,V, Dataset> - Parameters:
sortExprs- (undocumented)f- (undocumented)evidence$3- (undocumented)- Returns:
- (undocumented)
- See Also:
-
org.apache.spark.sql.api.KeyValueGroupedDataset#flatMapGroups
- Inheritdoc:
-
flatMapSortedGroups
public <U> Dataset<U> flatMapSortedGroups(Column[] SortExprs, FlatMapGroupsFunction<K, V, U> f, Encoder<U> encoder) Description copied from class:KeyValueGroupedDataset(Java-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and a sorted iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a newDataset.This function does not support partial aggregation, and as a result requires shuffling all the data in the
Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or anorg.apache.spark.sql.expressions#Aggregator.Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling
toList) unless they are sure that this is possible given the memory constraints of their cluster.This is equivalent to
KeyValueGroupedDataset.flatMapGroups(scala.Function2<K, scala.collection.Iterator<V>, scala.collection.IterableOnce<U>>, org.apache.spark.sql.Encoder<U>), except for the iterator to be sorted according to the given sort expressions. That sorting does not add computational complexity.- Overrides:
flatMapSortedGroupsin classKeyValueGroupedDataset<K,V, Dataset> - Parameters:
SortExprs- (undocumented)f- (undocumented)encoder- (undocumented)- Returns:
- (undocumented)
- See Also:
-
org.apache.spark.sql.api.KeyValueGroupedDataset#flatMapGroups
- Inheritdoc:
-
keyAs
Description copied from class:KeyValueGroupedDatasetReturns a newKeyValueGroupedDatasetwhere the type of the key has been mapped to the specified type. The mapping of key columns to the type follows the same rules asasonDataset.- Specified by:
keyAsin classKeyValueGroupedDataset<K,V, Dataset> - Parameters:
evidence$1- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
keys
Description copied from class:KeyValueGroupedDatasetReturns aDatasetthat contains each unique key. This is equivalent to doing mapping over the Dataset to extract the keys and then running a distinct operation on those.- Specified by:
keysin classKeyValueGroupedDataset<K,V, Dataset> - Returns:
- (undocumented)
- Inheritdoc:
-
mapGroups
public <U> Dataset<U> mapGroups(scala.Function2<K, scala.collection.Iterator<V>, U> f, Encoder<U> evidence$23) Description copied from class:KeyValueGroupedDataset(Scala-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an element of arbitrary type which will be returned as a newDataset.This function does not support partial aggregation, and as a result requires shuffling all the data in the
Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or anorg.apache.spark.sql.expressions#Aggregator.Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling
toList) unless they are sure that this is possible given the memory constraints of their cluster.- Overrides:
mapGroupsin classKeyValueGroupedDataset<K,V, Dataset> - Parameters:
f- (undocumented)evidence$23- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
mapGroups
Description copied from class:KeyValueGroupedDataset(Java-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an element of arbitrary type which will be returned as a newDataset.This function does not support partial aggregation, and as a result requires shuffling all the data in the
Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or anorg.apache.spark.sql.expressions#Aggregator.Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling
toList) unless they are sure that this is possible given the memory constraints of their cluster.- Overrides:
mapGroupsin classKeyValueGroupedDataset<K,V, Dataset> - Parameters:
f- (undocumented)encoder- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
mapGroupsWithState
public <S,U> Dataset<U> mapGroupsWithState(scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, U> func, Encoder<S> evidence$4, Encoder<U> evidence$5) Description copied from class:KeyValueGroupedDataset(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. SeeGroupStatefor more details.- Specified by:
mapGroupsWithStatein classKeyValueGroupedDataset<K,V, Dataset> - Parameters:
func- Function to be called on every group.See
Encoderfor more details on what types are encodable to Spark SQL.evidence$4- (undocumented)evidence$5- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
mapGroupsWithState
public <S,U> Dataset<U> mapGroupsWithState(GroupStateTimeout timeoutConf, scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, U> func, Encoder<S> evidence$6, Encoder<U> evidence$7) Description copied from class:KeyValueGroupedDataset(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. SeeGroupStatefor more details.- Specified by:
mapGroupsWithStatein classKeyValueGroupedDataset<K,V, Dataset> - Parameters:
timeoutConf- Timeout configuration for groups that do not receive data for a while.See
Encoderfor more details on what types are encodable to Spark SQL.func- Function to be called on every group.evidence$6- (undocumented)evidence$7- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
mapGroupsWithState
public <S,U> Dataset<U> mapGroupsWithState(GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K, S> initialState, scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, U> func, Encoder<S> evidence$8, Encoder<U> evidence$9) - Inheritdoc:
-
mapGroupsWithState
public <S,U> Dataset<U> mapGroupsWithState(MapGroupsWithStateFunction<K, V, S, U> func, Encoder<S> stateEncoder, Encoder<U> outputEncoder) Description copied from class:KeyValueGroupedDataset(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. SeeGroupStatefor more details.- Overrides:
mapGroupsWithStatein classKeyValueGroupedDataset<K,V, Dataset> - Parameters:
func- Function to be called on every group.stateEncoder- Encoder for the state type.outputEncoder- Encoder for the output type.See
Encoderfor more details on what types are encodable to Spark SQL.- Returns:
- (undocumented)
- Inheritdoc:
-
mapGroupsWithState
public <S,U> Dataset<U> mapGroupsWithState(MapGroupsWithStateFunction<K, V, S, U> func, Encoder<S> stateEncoder, Encoder<U> outputEncoder, GroupStateTimeout timeoutConf) Description copied from class:KeyValueGroupedDataset(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. SeeGroupStatefor more details.- Overrides:
mapGroupsWithStatein classKeyValueGroupedDataset<K,V, Dataset> - Parameters:
func- Function to be called on every group.stateEncoder- Encoder for the state type.outputEncoder- Encoder for the output type.timeoutConf- Timeout configuration for groups that do not receive data for a while.See
Encoderfor more details on what types are encodable to Spark SQL.- Returns:
- (undocumented)
- Inheritdoc:
-
mapGroupsWithState
public <S,U> Dataset<U> mapGroupsWithState(MapGroupsWithStateFunction<K, V, S, U> func, Encoder<S> stateEncoder, Encoder<U> outputEncoder, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K, S> initialState) - Inheritdoc:
-
mapValues
Description copied from class:KeyValueGroupedDatasetReturns a newKeyValueGroupedDatasetwhere the given functionfunchas been applied to the data. The grouping key is unchanged by this.// Create values grouped by key from a Dataset[(K, V)] ds.groupByKey(_._1).mapValues(_._2) // Scala- Specified by:
mapValuesin classKeyValueGroupedDataset<K,V, Dataset> - Parameters:
func- (undocumented)evidence$2- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
mapValues
Description copied from class:KeyValueGroupedDatasetReturns a newKeyValueGroupedDatasetwhere the given functionfunchas been applied to the data. The grouping key is unchanged by this.// Create Integer values grouped by String key from a Dataset<Tuple2<String, Integer>> Dataset<Tuple2<String, Integer>> ds = ...; KeyValueGroupedDataset<String, Integer> grouped = ds.groupByKey(t -> t._1, Encoders.STRING()).mapValues(t -> t._2, Encoders.INT());- Overrides:
mapValuesin classKeyValueGroupedDataset<K,V, Dataset> - Parameters:
func- (undocumented)encoder- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
queryExecution
public org.apache.spark.sql.execution.QueryExecution queryExecution() -
reduceGroups
Description copied from class:KeyValueGroupedDataset(Scala-specific) Reduces the elements of each group of data using the specified binary function. The given function must be commutative and associative or the result may be non-deterministic.- Specified by:
reduceGroupsin classKeyValueGroupedDataset<K,V, Dataset> - Parameters:
f- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
reduceGroups
Description copied from class:KeyValueGroupedDataset(Java-specific) Reduces the elements of each group of data using the specified binary function. The given function must be commutative and associative or the result may be non-deterministic.- Overrides:
reduceGroupsin classKeyValueGroupedDataset<K,V, Dataset> - Parameters:
f- (undocumented)- Returns:
- (undocumented)
- Inheritdoc:
-
toString
-