org.apache.spark.sql.api.KeyValueGroupedDataset<K,V,Dataset>

org.apache.spark.sql.KeyValueGroupedDataset<K,V>

All Implemented Interfaces:: Serializable

public class KeyValueGroupedDataset<K,V> extends KeyValueGroupedDataset<K,V,Dataset>

A Dataset has been logically grouped by a user specified grouping key. Users should not construct a KeyValueGroupedDataset directly, but should instead call groupByKey on an existing Dataset.

Since:

2.0.0

See Also:

Serialized Form

Method Summary

Modifier and Type

Method

Description

<U1> Dataset<scala.Tuple2<K,U1>>

agg(TypedColumn<V,U1> col1)

Computes the given aggregation, returning a Dataset of tuples for each unique key and the result of computing this aggregation over all elements in the group.

<U1, U2> Dataset<scala.Tuple3<K,U1,U2>>

agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2)

Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

<U1, U2, U3> Dataset<scala.Tuple4<K,U1,U2,U3>>

agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3)

Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

<U1, U2, U3, U4> Dataset<scala.Tuple5<K,U1,U2,U3,U4>>

agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3, TypedColumn<V,U4> col4)

Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

<U1, U2, U3, U4, U5> Dataset<scala.Tuple6<K,U1,U2,U3,U4,U5>>

agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3, TypedColumn<V,U4> col4, TypedColumn<V,U5> col5)

Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

<U1, U2, U3, U4, U5, U6> Dataset<scala.Tuple7<K,U1,U2,U3,U4,U5,U6>>

agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3, TypedColumn<V,U4> col4, TypedColumn<V,U5> col5, TypedColumn<V,U6> col6)

Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

<U1, U2, U3, U4, U5, U6, U7> Dataset<scala.Tuple8<K,U1,U2,U3,U4,U5,U6,U7>>

agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3, TypedColumn<V,U4> col4, TypedColumn<V,U5> col5, TypedColumn<V,U6> col6, TypedColumn<V,U7> col7)

Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

<U1, U2, U3, U4, U5, U6, U7, U8> Dataset<scala.Tuple9<K,U1,U2,U3,U4,U5,U6,U7,U8>>

agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3, TypedColumn<V,U4> col4, TypedColumn<V,U5> col5, TypedColumn<V,U6> col6, TypedColumn<V,U7> col7, TypedColumn<V,U8> col8)

Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

<U, R> Dataset<R>

cogroup(KeyValueGroupedDataset<K,U> other, CoGroupFunction<K,V,U,R> f, Encoder<R> encoder)

<U, R> Dataset<R>

cogroup(KeyValueGroupedDataset<K,U> other, scala.Function3<K,scala.collection.Iterator<V>,scala.collection.Iterator,scala.collection.IterableOnce<R>> f, Encoder<R> evidence$30)

<U, R> Dataset<R>

cogroupSorted(KeyValueGroupedDataset<K,U> other, Column[] thisSortExprs, Column[] otherSortExprs, CoGroupFunction<K,V,U,R> f, Encoder<R> encoder)

<U, R> Dataset<R>

cogroupSorted(KeyValueGroupedDataset<K,U> other, scala.collection.immutable.Seq<Column> thisSortExprs, scala.collection.immutable.Seq<Column> otherSortExprs, scala.Function3<K,scala.collection.Iterator<V>,scala.collection.Iterator,scala.collection.IterableOnce<R>> f, Encoder<R> evidence$21)

Dataset<scala.Tuple2<K,Object>>

count()

Returns a Dataset that contains a tuple with each key and the number of items present for that key.

 Dataset

flatMapGroups(FlatMapGroupsFunction<K,V,U> f, Encoder encoder)

(Java-specific) Applies the given function to each group of data.

 Dataset

flatMapGroups(scala.Function2<K,scala.collection.Iterator<V>,scala.collection.IterableOnce> f, Encoder evidence$22)

(Scala-specific) Applies the given function to each group of data.

<S, U> Dataset

flatMapGroupsWithState(FlatMapGroupsWithStateFunction<K,V,S,U> func, OutputMode outputMode, Encoder<S> stateEncoder, Encoder outputEncoder, GroupStateTimeout timeoutConf)

(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

<S, U> Dataset

flatMapGroupsWithState(FlatMapGroupsWithStateFunction<K,V,S,U> func, OutputMode outputMode, Encoder<S> stateEncoder, Encoder outputEncoder, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K,S> initialState)

<S, U> Dataset

flatMapGroupsWithState(OutputMode outputMode, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K,S> initialState, scala.Function3<K,scala.collection.Iterator<V>,GroupState<S>,scala.collection.Iterator> func, Encoder<S> evidence$12, Encoder evidence$13)

<S, U> Dataset

flatMapGroupsWithState(OutputMode outputMode, GroupStateTimeout timeoutConf, scala.Function3<K,scala.collection.Iterator<V>,GroupState<S>,scala.collection.Iterator> func, Encoder<S> evidence$10, Encoder evidence$11)

(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

 Dataset

flatMapSortedGroups(Column[] SortExprs, FlatMapGroupsFunction<K,V,U> f, Encoder encoder)

(Java-specific) Applies the given function to each group of data.

 Dataset

flatMapSortedGroups(scala.collection.immutable.Seq<Column> sortExprs, scala.Function2<K,scala.collection.Iterator<V>,scala.collection.IterableOnce> f, Encoder evidence$3)

(Scala-specific) Applies the given function to each group of data.

<L> KeyValueGroupedDataset<L,V>

keyAs(Encoder<L> evidence$1)

Returns a new KeyValueGroupedDataset where the type of the key has been mapped to the specified type.

Dataset<K>

keys()

Returns a Dataset that contains each unique key.

 Dataset

mapGroups(MapGroupsFunction<K,V,U> f, Encoder encoder)

(Java-specific) Applies the given function to each group of data.

 Dataset

mapGroups(scala.Function2<K,scala.collection.Iterator<V>,U> f, Encoder evidence$23)

(Scala-specific) Applies the given function to each group of data.

<S, U> Dataset

mapGroupsWithState(MapGroupsWithStateFunction<K,V,S,U> func, Encoder<S> stateEncoder, Encoder outputEncoder)

(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

<S, U> Dataset

mapGroupsWithState(MapGroupsWithStateFunction<K,V,S,U> func, Encoder<S> stateEncoder, Encoder outputEncoder, GroupStateTimeout timeoutConf)

(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

<S, U> Dataset

mapGroupsWithState(MapGroupsWithStateFunction<K,V,S,U> func, Encoder<S> stateEncoder, Encoder outputEncoder, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K,S> initialState)

<S, U> Dataset

mapGroupsWithState(GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K,S> initialState, scala.Function3<K,scala.collection.Iterator<V>,GroupState<S>,U> func, Encoder<S> evidence$8, Encoder evidence$9)

<S, U> Dataset

mapGroupsWithState(GroupStateTimeout timeoutConf, scala.Function3<K,scala.collection.Iterator<V>,GroupState<S>,U> func, Encoder<S> evidence$6, Encoder evidence$7)

(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

<S, U> Dataset

mapGroupsWithState(scala.Function3<K,scala.collection.Iterator<V>,GroupState<S>,U> func, Encoder<S> evidence$4, Encoder evidence$5)

(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

<W> KeyValueGroupedDataset<K,W>

mapValues(MapFunction<V,W> func, Encoder<W> encoder)

Returns a new KeyValueGroupedDataset where the given function func has been applied to the data.

<W> KeyValueGroupedDataset<K,W>

mapValues(scala.Function1<V,W> func, Encoder<W> evidence$2)

Returns a new KeyValueGroupedDataset where the given function func has been applied to the data.

org.apache.spark.sql.execution.QueryExecution

queryExecution()

Dataset<scala.Tuple2<K,V>>

reduceGroups(ReduceFunction<V> f)

(Java-specific) Reduces the elements of each group of data using the specified binary function.

Dataset<scala.Tuple2<K,V>>

reduceGroups(scala.Function2<V,V,V> f)

(Scala-specific) Reduces the elements of each group of data using the specified binary function.

String

toString()

Methods inherited from class org.apache.spark.sql.api.KeyValueGroupedDataset
cogroup, cogroup, cogroupSorted, cogroupSorted, flatMapGroupsWithState, flatMapGroupsWithState, mapGroupsWithState, mapGroupsWithState

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait

Method Details
- agg
 
 public <U1> Dataset<scala.Tuple2<K,U1>> agg(TypedColumn<V,U1> col1)
 
 Description copied from class: KeyValueGroupedDataset
 
 Computes the given aggregation, returning a Dataset of tuples for each unique key and the result of computing this aggregation over all elements in the group.
 
 Overrides:
 
 agg in class KeyValueGroupedDataset<K,V,Dataset>
 
 Parameters:
 
 col1 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Inheritdoc:
- agg
 
 public <U1, U2> Dataset<scala.Tuple3<K,U1,U2>> agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2)
 
 Description copied from class: KeyValueGroupedDataset
 
 Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
 
 Overrides:
 
 agg in class KeyValueGroupedDataset<K,V,Dataset>
 
 Parameters:
 
 col1 - (undocumented)
 
 col2 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Inheritdoc:
- agg
 
 public <U1, U2, U3> Dataset<scala.Tuple4<K,U1,U2,U3>> agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3)
 
 Description copied from class: KeyValueGroupedDataset
 
 Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
 
 Overrides:
 
 agg in class KeyValueGroupedDataset<K,V,Dataset>
 
 Parameters:
 
 col1 - (undocumented)
 
 col2 - (undocumented)
 
 col3 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Inheritdoc:
- agg
 
 public <U1, U2, U3, U4> Dataset<scala.Tuple5<K,U1,U2,U3,U4>> agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3, TypedColumn<V,U4> col4)
 
 Description copied from class: KeyValueGroupedDataset
 
 Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
 
 Overrides:
 
 agg in class KeyValueGroupedDataset<K,V,Dataset>
 
 Parameters:
 
 col1 - (undocumented)
 
 col2 - (undocumented)
 
 col3 - (undocumented)
 
 col4 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Inheritdoc:
- agg
 
 public <U1, U2, U3, U4, U5> Dataset<scala.Tuple6<K,U1,U2,U3,U4,U5>> agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3, TypedColumn<V,U4> col4, TypedColumn<V,U5> col5)
 
 Description copied from class: KeyValueGroupedDataset
 
 Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
 
 Overrides:
 
 agg in class KeyValueGroupedDataset<K,V,Dataset>
 
 Parameters:
 
 col1 - (undocumented)
 
 col2 - (undocumented)
 
 col3 - (undocumented)
 
 col4 - (undocumented)
 
 col5 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Inheritdoc:
- agg
 
 public <U1, U2, U3, U4, U5, U6> Dataset<scala.Tuple7<K,U1,U2,U3,U4,U5,U6>> agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3, TypedColumn<V,U4> col4, TypedColumn<V,U5> col5, TypedColumn<V,U6> col6)
 
 Description copied from class: KeyValueGroupedDataset
 
 Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
 
 Overrides:
 
 agg in class KeyValueGroupedDataset<K,V,Dataset>
 
 Parameters:
 
 col1 - (undocumented)
 
 col2 - (undocumented)
 
 col3 - (undocumented)
 
 col4 - (undocumented)
 
 col5 - (undocumented)
 
 col6 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Inheritdoc:
- agg
 
 public <U1, U2, U3, U4, U5, U6, U7> Dataset<scala.Tuple8<K,U1,U2,U3,U4,U5,U6,U7>> agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3, TypedColumn<V,U4> col4, TypedColumn<V,U5> col5, TypedColumn<V,U6> col6, TypedColumn<V,U7> col7)
 
 Description copied from class: KeyValueGroupedDataset
 
 Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
 
 Overrides:
 
 agg in class KeyValueGroupedDataset<K,V,Dataset>
 
 Parameters:
 
 col1 - (undocumented)
 
 col2 - (undocumented)
 
 col3 - (undocumented)
 
 col4 - (undocumented)
 
 col5 - (undocumented)
 
 col6 - (undocumented)
 
 col7 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Inheritdoc:
- agg
 
 public <U1, U2, U3, U4, U5, U6, U7, U8> Dataset<scala.Tuple9<K,U1,U2,U3,U4,U5,U6,U7,U8>> agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3, TypedColumn<V,U4> col4, TypedColumn<V,U5> col5, TypedColumn<V,U6> col6, TypedColumn<V,U7> col7, TypedColumn<V,U8> col8)
 
 Description copied from class: KeyValueGroupedDataset
 
 Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
 
 Overrides:
 
 agg in class KeyValueGroupedDataset<K,V,Dataset>
 
 Parameters:
 
 col1 - (undocumented)
 
 col2 - (undocumented)
 
 col3 - (undocumented)
 
 col4 - (undocumented)
 
 col5 - (undocumented)
 
 col6 - (undocumented)
 
 col7 - (undocumented)
 
 col8 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Inheritdoc:
- cogroup
 
 public <U, R> Dataset<R> cogroup(KeyValueGroupedDataset<K,U> other, scala.Function3<K,scala.collection.Iterator<V>,scala.collection.Iterator,scala.collection.IterableOnce<R>> f, Encoder<R> evidence$30)
 
 Inheritdoc:
- cogroup
 
 public <U, R> Dataset<R> cogroup(KeyValueGroupedDataset<K,U> other, CoGroupFunction<K,V,U,R> f, Encoder<R> encoder)
 
 Inheritdoc:
- cogroupSorted
 
 public <U, R> Dataset<R> cogroupSorted(KeyValueGroupedDataset<K,U> other, scala.collection.immutable.Seq<Column> thisSortExprs, scala.collection.immutable.Seq<Column> otherSortExprs, scala.Function3<K,scala.collection.Iterator<V>,scala.collection.Iterator,scala.collection.IterableOnce<R>> f, Encoder<R> evidence$21)
 
 Inheritdoc:
- cogroupSorted
 
 public <U, R> Dataset<R> cogroupSorted(KeyValueGroupedDataset<K,U> other, Column[] thisSortExprs, Column[] otherSortExprs, CoGroupFunction<K,V,U,R> f, Encoder<R> encoder)
 
 Inheritdoc:
- count
 
 public Dataset<scala.Tuple2<K,Object>> count()
 
 Description copied from class: KeyValueGroupedDataset
 
 Returns a Dataset that contains a tuple with each key and the number of items present for that key.
 
 Overrides:
 
 count in class KeyValueGroupedDataset<K,V,Dataset>
 
 Returns:
 
 (undocumented)
 
 Inheritdoc:
- flatMapGroups
 
 public Dataset flatMapGroups(scala.Function2<K,scala.collection.Iterator<V>,scala.collection.IterableOnce> f, Encoder evidence$22)
 
 Description copied from class: KeyValueGroupedDataset
 
 (Scala-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.
 This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an org.apache.spark.sql.expressions#Aggregator.
 Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.
 
 Overrides:
 
 flatMapGroups in class KeyValueGroupedDataset<K,V,Dataset>
 
 Parameters:
 
 f - (undocumented)
 
 evidence$22 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Inheritdoc:
- flatMapGroups
 
 public Dataset flatMapGroups(FlatMapGroupsFunction<K,V,U> f, Encoder encoder)
 
 Description copied from class: KeyValueGroupedDataset
 
 (Java-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.
 This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an org.apache.spark.sql.expressions#Aggregator.
 Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.
 
 Overrides:
 
 flatMapGroups in class KeyValueGroupedDataset<K,V,Dataset>
 
 Parameters:
 
 f - (undocumented)
 
 encoder - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Inheritdoc:
- flatMapGroupsWithState
 
 public <S, U> Dataset flatMapGroupsWithState(OutputMode outputMode, GroupStateTimeout timeoutConf, scala.Function3<K,scala.collection.Iterator<V>,GroupState<S>,scala.collection.Iterator> func, Encoder<S> evidence$10, Encoder evidence$11)
 
 Description copied from class: KeyValueGroupedDataset
 
 (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See GroupState for more details.
 
 Specified by:
 
 flatMapGroupsWithState in class KeyValueGroupedDataset<K,V,Dataset>
 
 Parameters:
 
 outputMode - The output mode of the function.
 
 timeoutConf - Timeout configuration for groups that do not receive data for a while.
 See Encoder for more details on what types are encodable to Spark SQL.
 
 func - Function to be called on every group.
 
 evidence$10 - (undocumented)
 
 evidence$11 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Inheritdoc:
- flatMapGroupsWithState
 
 public <S, U> Dataset flatMapGroupsWithState(OutputMode outputMode, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K,S> initialState, scala.Function3<K,scala.collection.Iterator<V>,GroupState<S>,scala.collection.Iterator> func, Encoder<S> evidence$12, Encoder evidence$13)
 
 Inheritdoc:
- flatMapGroupsWithState
 
 public <S, U> Dataset flatMapGroupsWithState(FlatMapGroupsWithStateFunction<K,V,S,U> func, OutputMode outputMode, Encoder<S> stateEncoder, Encoder outputEncoder, GroupStateTimeout timeoutConf)
 
 Description copied from class: KeyValueGroupedDataset
 
 (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See GroupState for more details.
 
 Overrides:
 
 flatMapGroupsWithState in class KeyValueGroupedDataset<K,V,Dataset>
 
 Parameters:
 
 func - Function to be called on every group.
 
 outputMode - The output mode of the function.
 
 stateEncoder - Encoder for the state type.
 
 outputEncoder - Encoder for the output type.
 
 timeoutConf - Timeout configuration for groups that do not receive data for a while.
 See Encoder for more details on what types are encodable to Spark SQL.
 
 Returns:
 
 (undocumented)
 
 Inheritdoc:
- flatMapGroupsWithState
 
 public <S, U> Dataset flatMapGroupsWithState(FlatMapGroupsWithStateFunction<K,V,S,U> func, OutputMode outputMode, Encoder<S> stateEncoder, Encoder outputEncoder, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K,S> initialState)
 
 Inheritdoc:
- flatMapSortedGroups
 
 public Dataset flatMapSortedGroups(scala.collection.immutable.Seq<Column> sortExprs, scala.Function2<K,scala.collection.Iterator<V>,scala.collection.IterableOnce> f, Encoder evidence$3)
 
 Description copied from class: KeyValueGroupedDataset
 
 (Scala-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and a sorted iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.
 This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an org.apache.spark.sql.expressions#Aggregator.
 Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.
 This is equivalent to KeyValueGroupedDataset.flatMapGroups(scala.Function2<K, scala.collection.Iterator<V>, scala.collection.IterableOnce>, org.apache.spark.sql.Encoder), except for the iterator to be sorted according to the given sort expressions. That sorting does not add computational complexity.
 Specified by:
 
 flatMapSortedGroups in class KeyValueGroupedDataset<K,V,Dataset>
 
 Parameters:
 
 sortExprs - (undocumented)
 
 f - (undocumented)
 
 evidence$3 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 See Also:
 
 org.apache.spark.sql.api.KeyValueGroupedDataset#flatMapGroups
 
 Inheritdoc:
- flatMapSortedGroups
 
 public Dataset flatMapSortedGroups(Column[] SortExprs, FlatMapGroupsFunction<K,V,U> f, Encoder encoder)
 
 Description copied from class: KeyValueGroupedDataset
 
 (Java-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and a sorted iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.
 This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an org.apache.spark.sql.expressions#Aggregator.
 Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.
 This is equivalent to KeyValueGroupedDataset.flatMapGroups(scala.Function2<K, scala.collection.Iterator<V>, scala.collection.IterableOnce>, org.apache.spark.sql.Encoder), except for the iterator to be sorted according to the given sort expressions. That sorting does not add computational complexity.
 Overrides:
 
 flatMapSortedGroups in class KeyValueGroupedDataset<K,V,Dataset>
 
 Parameters:
 
 SortExprs - (undocumented)
 
 f - (undocumented)
 
 encoder - (undocumented)
 
 Returns:
 
 (undocumented)
 
 See Also:
 
 org.apache.spark.sql.api.KeyValueGroupedDataset#flatMapGroups
 
 Inheritdoc:
- keyAs
 
 public <L> KeyValueGroupedDataset<L,V> keyAs(Encoder<L> evidence$1)
 
 Description copied from class: KeyValueGroupedDataset
 
 Returns a new KeyValueGroupedDataset where the type of the key has been mapped to the specified type. The mapping of key columns to the type follows the same rules as as on Dataset.
 
 Specified by:
 
 keyAs in class KeyValueGroupedDataset<K,V,Dataset>
 
 Parameters:
 
 evidence$1 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Inheritdoc:
- keys
 
 public Dataset<K> keys()
 
 Description copied from class: KeyValueGroupedDataset
 
 Returns a Dataset that contains each unique key. This is equivalent to doing mapping over the Dataset to extract the keys and then running a distinct operation on those.
 
 Specified by:
 
 keys in class KeyValueGroupedDataset<K,V,Dataset>
 
 Returns:
 
 (undocumented)
 
 Inheritdoc:
- mapGroups
 
 public Dataset mapGroups(scala.Function2<K,scala.collection.Iterator<V>,U> f, Encoder evidence$23)
 
 Description copied from class: KeyValueGroupedDataset
 
 (Scala-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an element of arbitrary type which will be returned as a new Dataset.
 This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an org.apache.spark.sql.expressions#Aggregator.
 Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.
 
 Overrides:
 
 mapGroups in class KeyValueGroupedDataset<K,V,Dataset>
 
 Parameters:
 
 f - (undocumented)
 
 evidence$23 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Inheritdoc:
- mapGroups
 
 public Dataset mapGroups(MapGroupsFunction<K,V,U> f, Encoder encoder)
 
 Description copied from class: KeyValueGroupedDataset
 
 (Java-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an element of arbitrary type which will be returned as a new Dataset.
 This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an org.apache.spark.sql.expressions#Aggregator.
 Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.
 
 Overrides:
 
 mapGroups in class KeyValueGroupedDataset<K,V,Dataset>
 
 Parameters:
 
 f - (undocumented)
 
 encoder - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Inheritdoc:
- mapGroupsWithState
 
 public <S, U> Dataset mapGroupsWithState(scala.Function3<K,scala.collection.Iterator<V>,GroupState<S>,U> func, Encoder<S> evidence$4, Encoder evidence$5)
 
 Description copied from class: KeyValueGroupedDataset
 
 (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See GroupState for more details.
 
 Specified by:
 
 mapGroupsWithState in class KeyValueGroupedDataset<K,V,Dataset>
 
 Parameters:
 
 func - Function to be called on every group.
 See Encoder for more details on what types are encodable to Spark SQL.
 
 evidence$4 - (undocumented)
 
 evidence$5 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Inheritdoc:
- mapGroupsWithState
 
 public <S, U> Dataset mapGroupsWithState(GroupStateTimeout timeoutConf, scala.Function3<K,scala.collection.Iterator<V>,GroupState<S>,U> func, Encoder<S> evidence$6, Encoder evidence$7)
 
 Description copied from class: KeyValueGroupedDataset
 
 (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See GroupState for more details.
 
 Specified by:
 
 mapGroupsWithState in class KeyValueGroupedDataset<K,V,Dataset>
 
 Parameters:
 
 timeoutConf - Timeout configuration for groups that do not receive data for a while.
 See Encoder for more details on what types are encodable to Spark SQL.
 
 func - Function to be called on every group.
 
 evidence$6 - (undocumented)
 
 evidence$7 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Inheritdoc:
- mapGroupsWithState
 
 public <S, U> Dataset mapGroupsWithState(GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K,S> initialState, scala.Function3<K,scala.collection.Iterator<V>,GroupState<S>,U> func, Encoder<S> evidence$8, Encoder evidence$9)
 
 Inheritdoc:
- mapGroupsWithState
 
 public <S, U> Dataset mapGroupsWithState(MapGroupsWithStateFunction<K,V,S,U> func, Encoder<S> stateEncoder, Encoder outputEncoder)
 
 Description copied from class: KeyValueGroupedDataset
 
 (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See GroupState for more details.
 
 Overrides:
 
 mapGroupsWithState in class KeyValueGroupedDataset<K,V,Dataset>
 
 Parameters:
 
 func - Function to be called on every group.
 
 stateEncoder - Encoder for the state type.
 
 outputEncoder - Encoder for the output type.
 See Encoder for more details on what types are encodable to Spark SQL.
 
 Returns:
 
 (undocumented)
 
 Inheritdoc:
- mapGroupsWithState
 
 public <S, U> Dataset mapGroupsWithState(MapGroupsWithStateFunction<K,V,S,U> func, Encoder<S> stateEncoder, Encoder outputEncoder, GroupStateTimeout timeoutConf)
 
 Description copied from class: KeyValueGroupedDataset
 
 (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See GroupState for more details.
 
 Overrides:
 
 mapGroupsWithState in class KeyValueGroupedDataset<K,V,Dataset>
 
 Parameters:
 
 func - Function to be called on every group.
 
 stateEncoder - Encoder for the state type.
 
 outputEncoder - Encoder for the output type.
 
 timeoutConf - Timeout configuration for groups that do not receive data for a while.
 See Encoder for more details on what types are encodable to Spark SQL.
 
 Returns:
 
 (undocumented)
 
 Inheritdoc:
- mapGroupsWithState
 
 public <S, U> Dataset mapGroupsWithState(MapGroupsWithStateFunction<K,V,S,U> func, Encoder<S> stateEncoder, Encoder outputEncoder, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K,S> initialState)
 
 Inheritdoc:
- mapValues
 
 public <W> KeyValueGroupedDataset<K,W> mapValues(scala.Function1<V,W> func, Encoder<W> evidence$2)
 
 Description copied from class: KeyValueGroupedDataset
 Returns a new KeyValueGroupedDataset where the given function func has been applied to the data. The grouping key is unchanged by this.
 
 // Create values grouped by key from a Dataset[(K, V)] ds.groupByKey(_._1).mapValues(_._2) // Scala
 Specified by:
 
 mapValues in class KeyValueGroupedDataset<K,V,Dataset>
 
 Parameters:
 
 func - (undocumented)
 
 evidence$2 - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Inheritdoc:
- mapValues
 
 public <W> KeyValueGroupedDataset<K,W> mapValues(MapFunction<V,W> func, Encoder<W> encoder)
 
 Description copied from class: KeyValueGroupedDataset
 Returns a new KeyValueGroupedDataset where the given function func has been applied to the data. The grouping key is unchanged by this.
 
 // Create Integer values grouped by String key from a Dataset<Tuple2<String, Integer>> Dataset<Tuple2<String, Integer>> ds = ...; KeyValueGroupedDataset<String, Integer> grouped = ds.groupByKey(t -> t._1, Encoders.STRING()).mapValues(t -> t._2, Encoders.INT());
 Overrides:
 
 mapValues in class KeyValueGroupedDataset<K,V,Dataset>
 
 Parameters:
 
 func - (undocumented)
 
 encoder - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Inheritdoc:
- queryExecution
 
 public org.apache.spark.sql.execution.QueryExecution queryExecution()
- reduceGroups
 
 public Dataset<scala.Tuple2<K,V>> reduceGroups(scala.Function2<V,V,V> f)
 
 Description copied from class: KeyValueGroupedDataset
 
 (Scala-specific) Reduces the elements of each group of data using the specified binary function. The given function must be commutative and associative or the result may be non-deterministic.
 
 Specified by:
 
 reduceGroups in class KeyValueGroupedDataset<K,V,Dataset>
 
 Parameters:
 
 f - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Inheritdoc:
- reduceGroups
 
 public Dataset<scala.Tuple2<K,V>> reduceGroups(ReduceFunction<V> f)
 
 Description copied from class: KeyValueGroupedDataset
 
 (Java-specific) Reduces the elements of each group of data using the specified binary function. The given function must be commutative and associative or the result may be non-deterministic.
 
 Overrides:
 
 reduceGroups in class KeyValueGroupedDataset<K,V,Dataset>
 
 Parameters:
 
 f - (undocumented)
 
 Returns:
 
 (undocumented)
 
 Inheritdoc:
- toString
 
 public String toString()
 
 Overrides:
 
 toString in class Object

Class KeyValueGroupedDataset<K,V>

Method Summary

Methods inherited from class org.apache.spark.sql.api.KeyValueGroupedDataset

Methods inherited from class java.lang.Object

Method Details

agg

agg

agg

agg

agg

agg

agg

agg

cogroup

cogroup

cogroupSorted

cogroupSorted

count

flatMapGroups

flatMapGroups

flatMapGroupsWithState

flatMapGroupsWithState

flatMapGroupsWithState

flatMapGroupsWithState

flatMapSortedGroups

flatMapSortedGroups

keyAs

keys

mapGroups

mapGroups

mapGroupsWithState

mapGroupsWithState

mapGroupsWithState

mapGroupsWithState

mapGroupsWithState

mapGroupsWithState

mapValues

mapValues

queryExecution

reduceGroups

reduceGroups

toString