Код: Выделить всё
custom_schema = create_StructType_schema (access_key,secret_access_key,schema_bucket_name,schema_folder,schema_file_name)
metadata_fp = {"comment": "Path of the file from metadata (_metadata.file_path)"}
metadata_lmd = {"comment": "Ingestion timestamp for the current record"}
# Append fields with metadata to schema
custom_schema.add(StructField("file_path", StringType(), True, metadata_fp))
custom_schema.add(StructField("last_modified_date", TimestampType(), True, metadata_lmd))
DataBricks
Код: Выделить всё
dfRaw=spark.readStream.format("cloudFiles").option("cloudFiles.format",file_format).option("recursiveFileLookup","true").option("cloudFiles.allowOverwrites", True).option("delimiter",file_delimiter).option("multiline","true").option("header",file_header).schema(custom_schema).load(location).select("*", "_metadata.file_path").withColumn("last_modified_date",F.current_timestamp())
Подробнее здесь: https://stackoverflow.com/questions/796 ... e-path-and