Explanation: Explanation
spark.createDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])
Correct. This command uses the Spark Session's createDataFrame method to create a new DataFrame. Notice how rows, columns, and column names are passed in here: The rows are specified
as a Python list. Every entry in the list is a new row. Columns are specified as Python tuples (for example ("summer", 4.5)). Every column is one entry in the tuple.
The column names are specified as the second argument to createDataFrame(). The documentation (link below) shows that "when schema is a list of column names, the type of each column will be
inferred from data" (the first argument). Since values 4.5 and 7.5 are both float variables, Spark will correctly infer the double type for column wind_speed_ms. Given that all values in column
"season" contain only strings, Spark will cast the column appropriately as string.
Find out more about SparkSession.createDataFrame() via the link below.
spark.newDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])
No, the SparkSession does not have a newDataFrame method.
from pyspark.sql import types as T
spark.createDataFrame((("summer", 4.5), ("winter", 7.5)), T.StructType([T.StructField("season", T.CharType()), T.StructField("season", T.DoubleType())]))
No. pyspark.sql.types does not have a CharType type. See link below for available data types in Spark.
spark.createDataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})
No, this is not correct Spark syntax. If you have considered this option to be correct, you may have some experience with Python's pandas package, in which this would be correct syntax. To create
a Spark DataFrame from a Pandas DataFrame, you can simply use spark.createDataFrame(pandasDf) where pandasDf is the Pandas DataFrame.
Find out more about Spark syntax options using the examples in the documentation for SparkSession.createDataFrame linked below.
spark.DataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})
No, the Spark Session (indicated by spark in the code above) does not have a DataFrame method.
More info: pyspark.sql.SparkSession.createDataFrame — PySpark 3.1.1 documentation and Data Types - Spark 3.1.2 Documentation
Static notebook | Dynamic notebook: See test 1, QUESTION NO: 41 (Databricks import instructions)