pyspark.sql.DataFrameWriter.insertInto#

DataFrameWriter.insertInto(tableName, overwrite=None)[source]#

Inserts the content of the DataFrame to the specified table.

It requires that the schema of the DataFrame is the same as the schema of the table.

New in version 1.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
overwritebool, optional

If true, overwrites existing data. Disabled by default

Notes

Unlike DataFrameWriter.saveAsTable(), DataFrameWriter.insertInto() ignores the column names and just uses position-based resolution.

Examples

>>> _ = spark.sql("DROP TABLE IF EXISTS tblA")
>>> df = spark.createDataFrame([
...     (100, "Hyukjin Kwon"), (120, "Hyukjin Kwon"), (140, "Haejoon Lee")],
...     schema=["age", "name"]
... )
>>> df.write.saveAsTable("tblA")

Insert the data into ‘tblA’ table but with different column names.

>>> df.selectExpr("age AS col1", "name AS col2").write.insertInto("tblA")
>>> spark.read.table("tblA").sort("age").show()
+---+------------+
|age|        name|
+---+------------+
|100|Hyukjin Kwon|
|100|Hyukjin Kwon|
|120|Hyukjin Kwon|
|120|Hyukjin Kwon|
|140| Haejoon Lee|
|140| Haejoon Lee|
+---+------------+
>>> _ = spark.sql("DROP TABLE tblA")