pyspark.sql.plot.core.PySparkPlotAccessor.box#
- PySparkPlotAccessor.box(column=None, **kwargs)[source]#
Make a box plot of the DataFrame columns.
Make a box-and-whisker plot from DataFrame columns, optionally grouped by some other columns. A box plot is a method for graphically depicting groups of numerical data through their quartiles. The box extends from the Q1 to Q3 quartile values of the data, with a line at the median (Q2). The whiskers extend from the edges of box to show the range of the data. By default, they extend no more than 1.5 * IQR (IQR = Q3 - Q1) from the edges of the box, ending at the farthest data point within that interval. Outliers are plotted as separate dots.
- Parameters
- column: str or list of str, optional
Column name or list of names to be used for creating the box plot. If None (default), all numeric columns will be used. If no numeric columns exist, behavior may depend on the plot backend.
- **kwargs
Extra arguments to precision: refer to a float that is used by pyspark to compute approximate statistics for building a boxplot. The default value is 0.01. Use smaller values to get more precise statistics.
- Returns
plotly.graph_objs.Figure
Examples
>>> from pyspark.sql import SparkSession >>> spark = SparkSession.builder.getOrCreate() >>> data = [ ... ("A", 50, 55), ... ("B", 55, 60), ... ("C", 60, 65), ... ("D", 65, 70), ... ("E", 70, 75), ... ("F", 10, 15), ... ("G", 85, 90), ... ("H", 5, 150), ... ] >>> columns = ["student", "math_score", "english_score"] >>> df = spark.createDataFrame(data, columns) >>> df.plot.box()