

To sync the partition information in the metastore, you can invoke MSCK REPAIR TABLE. Note that partition information is not gathered by default when creating external datasource tables (those with a path option). SET LOCATION are now available for tables created with the Datasource API. Hive DDLs such as ALTER TABLE PARTITION.Since the metastore can return only necessary partitions for a query, discovering all the partitions on the first query to the table is no longer needed.Starting from Spark 2.1, persistent datasource tables have per-partition metadata stored in the Hive metastore. When the table isĭropped, the default table path will be removed too. Specified, Spark will write data to a default table path under the warehouse directory. The custom table path will not be removed and the table data is still there.

df.write.option("path", "/some/path").saveAsTable("t"). you can specify a custom table path via the A DataFrame for a persistent table canīe created by calling the table method on a SparkSession with the name of the table.įor file-based data source, e.g.

Long as you maintain your connection to the same metastore.

Persistent tables will still exist even after your Spark program has restarted, as SaveAsTable will materialize the contents of the DataFrame and create a pointer to the data in the Unlike the createOrReplaceTempView command, Spark will create aĭefault local Hive metastore (using Derby) for you. Notice that an existing Hive deployment is not necessary to use this feature. This is similar to a CREATE TABLE IF NOT EXISTS in SQL.ĭataFrames can also be saved as persistent tables into Hive metastore using the saveAsTableĬommand. The save operation is expected not to save the contents of the DataFrame and not toĬhange the existing data. Ignore mode means that when saving a DataFrame to a data source, if data already exists, If data/table already exists, existing data is expected to be overwritten by the contents of Overwrite mode means that when saving a DataFrame to a data source, When saving a DataFrame to a data source, if data/table already exists,Ĭontents of the DataFrame are expected to be appended to existing data. When saving a DataFrame to a data source, if data already exists, Additionally, when performing an Overwrite, the data will be deleted before writing out the It is important to realize that these save modes do not utilize any locking and are notĪtomic.
#SCALA OPTION HOW TO#
Save operations can optionally take a SaveMode, that specifies how to handle existing data if
#SCALA OPTION FULL#
Find full example code at "examples/src/main/r/RSparkSQLExample.R" in the Spark repo.
