nrelCsvExt 1.0.1 - Import and export functions for CSV data

nrelCsvExt

Registered StackHub users may elect to receive email notifications whenever a new package version is released.

There are 0 watchers.

v1.0.1

Overview

The NREL CSV extension provides convenience functions for the import and export of CSV data. The extension is not designed to import and export Haystack-compliant CSV data, but rather to provide a flexible interface for exchanging arbitrary, general purpose CSV data with other software.

The NREL CSV extension requires SkySpark 3.1 or later and nrelUtilityExt 2.0 or later. Most functions are overridable. See the nrelCsvExt Github repository for important release notes and a change log. Please report issues here.

Importing Records

csvImportRecs and related functions provide flexible import and parsing of records from CSV data.

Column Specification

csvImportRecs accepts a column specification dictionary, or spec, that governs the interpretation of CSV data columns. In the spec, key names match to column names in the CSV file (after applying toTagName) and key values are strings specifying a column data type. The supported data types are:

auto: guesses data type automatically using parseAuto
bool: parsed as a Boolean
coord: parsed as a coord
date: parsed as a date
datetime: parsed as a dateTime
dict: parsed as a dict (must follow valid Zinc syntax)
float: parsed as a floating point number
ignore: ignored (skipped)
integer: parsed as an integer
list: parsed as a list (must follow valid Zinc syntax)
marker: any non-empty value is interpreted as setting a marker flag
number: parsed as a number
ref: parsed as a ref; handles an optional trailing dis
string: parsed as a string
tags: parsed as set of tags provided as name-value pairs (must follow valid Zinc syntax for each tag)
time: parsed as a time
uri: parsed as a uri

Any other value in spec will cause an error when the corresponding column is parsed. Any column without an entry in spec is parsed as a string. The "tags" and "coord" data types use an alternate seperator; see the sep option below.

Options

Options for csvImportRecs are passed as name-value pairs within the opts argument. The following options are supported:

delimiter: Column delimiter character; passed through to ioStreamCsv
noHeader: Passed through to ioStreamCsv
sep: Tag and value separator used to interpret "tags" and "coord" data types (must not be the same as delimiter; default = " ")
template: Optional dict of tags to apply to all imported records
checked: Boolean; if true then throw an error on parsing failures; if false return null for CSV data that fails to parse (default = true)
commit: Boolean; if true commit records to the Folio as well as returning them (default = false)
log: Boolean; if true create a log entry for each committed record (default = false)
warn: Boolean; if true log warnings; if false suppress them (default = true)
datePattern: Pattern for parsing dates (for default, see parseDate)
dateTimePattern: Pattern for parsing dateTimes (for default, see parseDateTime)
timePattern: Pattern for parsing times (for default, see parseTime)
tz: Time zone for parsing dateTimes (default = now().tz)

The checked, commit, log, and warn' options may be also be passed as markers, with x equivalent to x:true and -x equivalent to x:false. For date, time, and dateTime parsing patterns see sys::Time.toLocale.

Entity-Specific Import

You may import Haystack weatherStation, site, space, equip, and point and entities using the tailored functions:

Weather Stations: csvImportWeatherStations
Sites: csvImportSites
Spaces: csvImportSpaces
Equips: csvImportEquips
Points: csvImportPoints

These functions use the same column specification format, data types, and options as csvImportRecs. However, they provide two additional features:

The ability to map some standard relationships (siteRef, equipRef, etc.)
Entity-specific default column specifications

Mapping Refs

Relationships for records are assigned using the following (in priority order):

As provided in the template option
By parsing an appropriate column named *Ref of data type ref, if present
By parsing an appropriate column named *Name of data type string and using the value to map the record to an existing entity by name/description

If multiple sources exist, the higher-priority source is used with a warning (suppressable by setting the warn option to false).

Mapping refs by name requires matching each imported record to an existing target entity either by description (as returned by dis(); default) or by the value of a user-specified tag on the target entity (e.g. navName). To override the default behavior, use the option:

matchRefsBy: Str; tag name to use when matching and assigning refs by name

As an example, consider assigning an equipRef tag to a set of imported points based on CSV data:

If an equipRef column is available, parse it as a ref
Else if the user specifies the option matchRefsBy:"navName" and an equipName column is available, use its value to match an equip record by the value of its navName tag
Else if an equipName column is available, use its value to match an equip record by its description, per dis()

The matchRefsBy option applies to any name-based record matching. Different behavior for different types of refs (e.g. sites vs. spaces) is not supported. Instead, use post-processing or the template option to assign the correct refs.

Nesting

Entities that can be nested (such as sites, spaces, and equips) also support mapping of parent records of the same type using the *Name pattern. However, for this to work properly, the parent records must already exist in the database. Making this work in practice requires multiple import passes.

Weather Station Column Specification

Default column specification for csvImportWeatherStations:

Tag                | Type    | Description
------------------ | ------- | ------------------------------------------------
dis                | string  | Weather station display name
weatherStationTags | tags    | Arbitrary set of tags to apply
geoAddr            | string  | Full address
geoStreet          | string  | Street address
geoCity            | string  | City
geoCounty          | string  | Subdivision of state/province
geoState           | string  | State or province
geoCountry         | string  | Country code
geoPostalCode      | string  | Postal code (ZIP code, in USA)
geoCoord           | coord   | Latitude/longitude in decimal format
geoElevation       | number  | Elevation above sea level
tz                 | string  | String designation for weather station time zone

Imported weather stations always receive the weatherStation tag, plus any user-specified template. If a tz column is not provided, imported weather stations also receive a tz tag from the tz import option (default = now().tz).

Site Column Specification

Default column specification for csvImportSites:

Tag                | Type    | Description
------------------ | ------- | ------------------------------------------------------------------------------------
dis                | string  | Site display name
siteTags           | tags    | Arbitrary set of tags to apply
area               | number  | Area of site
geoAddr            | string  | Full address
geoStreet          | string  | Street address
geoCity            | string  | City
geoCounty          | string  | Subdivision of state/province
geoState           | string  | State or province
geoCountry         | string  | Country code
geoPostalCode      | string  | Postal code (ZIP code, in USA)
geoCoord           | coord   | Latitude/longitude in decimal format
geoElevation       | number  | Elevation above sea level
primaryFunction    | string  | The primary function of the building
siteName           | string  | Parent site name; used to construct 'siteRef' via named-based matching
siteRef            | ref     | Site ref; overrides 'siteName'
tz                 | string  | String designation for site time zone
weatherStationName | string  | Weather station name; used to construct 'weatherStationRef' via named-based matching 
weatherStationRef  | ref     | Weather station ref; overrides 'weatherStationName'
yearBuilt          | number  | Year in which the building was constructed

Imported sites always receive the site tag, plus any user-specified template. If a tz column is not provided, imported weather stations also receive a tz tag from the tz import option (default = now().tz).

Space Column Specification

Default column specification for csvImportSpaces:

Tag          | Type    | Description
------------ | ------- | --------------------------------------------------------------------------
dis          | string  | Space display name
navName      | string  | Space navigation name (alternative to 'dis')
spaceTags    | tags    | Arbitrary set of tags to apply
area         | number  | Floor area of space
floor        | marker  | Sets the 'floor' marker
floorNum     | integer | Floor number per European convention
room         | marker  | Sets the 'room' marker
siteName     | string  | Associated site name; used to construct 'siteRef' via named-based matching
siteRef      | ref     | Site ref; overrides 'siteName'
spaceName    | string  | Parent space name; used to construct 'spaceRef' via named-based matching
spaceRef     | ref     | Space ref; overrides 'spaceName'
zone         | marker  | Sets the 'zone' marker

Imported spaces always receive the space tag, plus any user-specified template. In addition, any imported space records that lack the dis and disMacro tags but include a navName tag receive a default display macro:

disMacro:"$siteRef $navName"'

A custom disMacro may be specified using the template option.

Equip Column Specification

Default column specification for csvImportEquips:

Tag          | Type    | Description
------------ | ------- | ----------------------------------------------------------------------------
dis          | string  | Equipment display name
navName      | string  | Equipment navigation name (alternative to 'dis')
equipTags    | tags    | Arbitrary set of tags to apply
equipName    | string  | Parent equip name; used to construct 'equipRef' via named-based matching
equipRef     | ref     | Equip ref; overrides 'equipName'
siteName     | string  | Associated site name; used to construct 'siteRef' via named-based matching
siteRef      | ref     | Site ref; overrides 'siteName'
spaceName    | string  | Associated space name; used to construct 'spaceRef' via named-based matching
spaceRef     | ref     | Space ref; overrides 'spaceName'

Imported equipment always receive the equip tag, plus any user-specified template. In addition, any imported equip records that lack the dis and disMacro tags but include a navName tag receive a default display macro:

disMacro:"$siteRef $navName"'

A custom disMacro may be specified using the template option.

Point Column Specification

Default column specification for csvImportPoints:

Tag                | Type    | Description
------------------ | ------- | ------------------------------------------------------------------------------------
dis                | string  | Point display name
navName            | string  | Point navigation name (alternative to 'dis')
pointTags          | tags    | Arbitrary set of tags to apply
equipName          | string  | Associated equip name; used to construct 'equipRef' via named-based matching
equipRef           | ref     | Equip ref; overrides 'equipName'
kind               | string  | Kind of point (e.g. "Number" or "Bool"); must match SkySpark kind
siteName           | string  | Associated site name; used to construct 'siteRef' via named-based matching
siteRef            | ref     | Site ref; overrides 'siteName'
spaceName          | string  | Associated space name; used to construct 'spaceRef' via named-based matching
spaceRef           | ref     | Space ref; overrides 'spaceName'
tz                 | string  | String designation for point time zone
unit               | string  | Unit for numeric point (e.g. "kW")
weatherStationName | string  | Weather station name; used to construct 'weatherStationRef' via named-based matching 
weatherStationRef  | ref     | Weather station ref; overrides 'weatherStationName'

Imported points always receive the point tag, plus any user-specified template. In addition, any imported equip records that lack the dis and disMacro tags but include a navName tag receive a default display macro:

Regular points: disMacro:"$equipRef $navName"
Weather points: disMacro:"$weatherStationRef $navName"

A custom disMacro may be specified using the template option.

If tz is not provided, imported points are also assigned a time zone based on one of the following (in priority order):

The associated site
The associated weather station
The tz import option (default = now().tz).

For minimum Haystack compliance, pointTags should include one of sensor, cmd, or sp (or these tags should be imported via other columns).

Sequencing Imports

For best matching of refs, use the following import sequence:

Weather stations
Sites
Spaces
Equipment
Points

If any entities of the same type are nested (e.g. rooms within floors), split them into separate files and import the parents prior to the children.

Importing History

csvImportHistory imports time series data from a CSV source and writes it to point histories.

Overview

csvImportHistory imports CSV data from a single handle (String or Uri), processes it, and writes it to one or more points. To receive CSV data, each point record must:

Be of kind Number, Bool, or Str
Define the his tag
Define the csvColumn tag

Import control and customization is provided by tags attached to each target point record and a set of function options passed via the opts argument.

Control Tags

The control tags for csvImportHistory are:

csvColumn: (Required) A string or integer specifying the CSV data column that contains the target point's history
- String: Specifies CSV column by name
- Integer: Specifies CSV column by position (1st column has position 0)
csvUnit: String specifying the units of numeric CSV data
csvCallback: Optional callback function to be applied to the CSV data; follows the same conventions as OnWrite callbacks
csvConvert: Optional point value conversion as described in Point Conversion to be applied to the CSV data
csvRollupInterval: Number that defines a rollup interval for the CSV data
csvRollupFunc: String that defines a rollup function for the CSV data

Only the csvColumn tag is required. All other tags are optional, but provide a flexible framework for post-processing the imported CSV data prior to writing history to the target points.

Control Options

Control options for csvImportHistory are passed as name-value pairs within the opts argument. The following options are supported:

checked: Boolean; if true then throw an error on parsing failures; if false insert NA for CSV data that fails to parse (default = true)
delimiter: passed through to ioReadCsv
noHeader: passed through to ioReadCsv
warn: Boolean; if true log warnings; if false suppress them (default = true)
overwrite: Boolean; allows existing history to be overwritten, ignoring hisEnd (default = false)
preview: enables preview mode and specifies the number of rows to preview for each point
span: time span for which to write history; anything accepted by toSpan
tsColumn: specifies the timestamp column(s) (default = "ts"; see parsing timestamps)
tsPattern: pattern(s) for parsing CSV timestamps (see parsing timestamps)
tz: time zone for CSV timestamps (default = now().tz)

The checked and warn options may be also be passed as markers, with x equivalent to x:true and -x equivalent to x:false. If the preview option is provided as a marker, it defaults to preview:10.

Workflow

The key to properly specifying these tags and options to achieve a desired import workflow is understanding the order of operations that occur during data import:

Raw data are read from handle.
The raw data are filtered based on the provided points:
- The tsColumn option specifies the timestamp column(s).
- Each point's csvColumn tag specifies the value column.
If the timestamp is composite (made of multiple columns), timestamps are consolidated into a single column (see timestamp pattern).
Timestamps are parsed according to the tsPattern option (see parsing timestamps).
Values are parsed according to the point's kind tag (see parsing values).
- For Boolean points, numeric CSV data are converted to Boolean.
- For numeric points, units are assigned according to the point's csvUnit tag (overrides any embedded units).
- If csvUnit is missing and the parsed data has no embedded units, the point's unit tag (if any) will be used instead.
The data are sorted ascending by timestamp.
If the point has a csvCallback tag, the specified callback function is applied to the imported data (see callbacks).
If the point has a csvConvert tag, the specified point conversion is applied to the imported data using pointConvert.
If the point has a csvRollupInterval tag, the data are rolled up to the specified interval (see rollups).
Numeric values are converted from their original unit to the point's SkySpark unit.
Timestamps are converted to the point's time zone.
If the span option is specified, data rows with timestamps outside of span are dropped.
Data rows with timestamps that overlap with the point's existing history (timestamps prior to the point's hisEnd tag) are dropped (unless the overwrite option is set).
The data are written to SkySpark history (which triggers the OnWrite action if a hisOnWrite tag is defined).

Parsing Timestamps

The tsColumn, tsPattern, and tz options for csvImportHistory provide flexible configuration of timestamp parsing.

Timestamp Column

tsColumn may be any of the following:

String: A single timestamp column by name
Integer: A single timestamp column by index
List of Strings: Multiple timestamp columns by name
List of Integers: Multiple timestamp columns by index

Timestamps spread across multiple columns (such as date in one column and time in another column) are combined in the order specified. Column indices are zero-based, i.e. the first column is index 0.

Timestamp Pattern

tsPattern may be a single string or a list of strings, each providing a valid dateTime patterns per sys::Time.toLocale.

If no pattern is provided, parsing falls back to the parseDateTime default:
```
"YYYY-MM-DD'T'hh:mm:SS.FFFFFFFFFz zzzz"
```
If tsPattern is a list, patterns are tried in order until either one is successful or they have all been exhausted
If the timestamp directly encodes timezone information, then tz is silently ignored
Multiple columns: timestamp columns are concatenated in order by " " (space) prior to parsing; tsPattern must reflect the expected pattern of the concatenated columns

Parsing Values

Values are pased according to kind: Number, Bool, or Str. If necessary, numeric data are converted to Boolean with any nonzero value representing true. Supports the same keywords for true, false, null, NA, NaN, and infinity as parseAuto.

NA values are returned in the following circumstances:

NA is explicitly encoded in the CSV data
On parsing error and checked is false

For kind:"Str", the string is returned as-is (NA, null, etc. keywords are not parsed). To detect and handle NA values (or any other special cases), use the csvCallback option or an hisOnWrite function.

Callbacks

The csvCallback tag defines a callback function to be applied to the imported history data after initial import. The syntax follows the same conventions as OnWrite functions. The tag value must reference a valid top-level function. Partial application is supported. Lambda functions are not supported.

For security reasons, the callback function executes within an ephemeral task with the permissions of the synthetic task user. If the task extension is not loaded, attempting to use csvCallback will throw an error.

Rollups

The csvRollupInterval tag enables automatic rollup of imported history before writing data. Rollup behavior is as follows:

If the csvRollupFunc tag is present, hisRollup is called using the specified rollup function.
If the csvRollupFunc tag is missing, hisRollupAuto is used instead.

csvRollupFunc must be the name of a valid top-level function. Like callbacks, history rollups execute within an ephemeral task for security reasons and require the task extension to be loaded.

Overwriting History

By default, csvImportHistory() drops values prior to each point's hisEnd before writing data. This prevents overwriting existing data. To intentionally overwrite existing history, use the following options:

Set the span option to the time span you wish to overwrite
Set the overwrite option to true

If span is not provided, then the overwrite option is ignored, with a logged warning. When using overwrite, it is recommended to also set warn to false to avoid filling the log with history out-of-order warnings.

Previewing Data

The preview option enables preview mode. In preview mode, csvImportHistory returns a combined history grid previewing the data import but does not write the imported history to the database. The number of rows per point (data column) is limited to the numeric value of preview; see options. (The total number of rows may be greater than preview if not all rows of the original CSV file include values for every point.)

Error Handling

csvImportHistory does its best to recover from import errors: any point that encounters an import error is skipped and execution continues to the next point. Error logging is as follows:

If checked = true: Logged as errors, with an error thrown on function exit
If checked = false and warn = true: Logged as warnings, with a clean function exit
If checked = false and warn = false: Not logged; clean function exit

General errors are always logged and thrown, regardless of the checked and warn options.

Exporting History

csvExportHistory exports point history to a CSV file, optionally with some convenience processing. The function arguments are:

points: Point(s) for which to export history (may be any value supported by toRecList)
span: Timespan for which to export history (may be any value supported by toSpan)
handle: Uri of output file
opts: Dict of control options

Control Options

Control options for csvExportHistory are passed as name-value pairs within the opts argument. The following options are supported:

clean: Boolean; if true then numeric data will be range-cleaned prior to export (see range cleaning) (default = false)
defVal: Number or NA; governs range cleaning behavior (see range cleaning)
delimiter: passed through to ioWriteCsv
headerUnits: Boolean; if true then units will be appended to data column headers (default = true)
interpolate: Boolean; if true then the data will be interpolated (default = false)
interval: Number; optional rollup interval (default = none)
noHeader: passed through to ioWriteCsv
preview: enables preview mode and specifies the number of rows to preview
removeUnits: Boolean; if true then units will be removed from numerical data prior to export (default = true)
tsPattern: Str; Optional timestamp format pattern

The clean, headerUnits, interpolate, and removeUnits options may be also be passed as markers, with x equivalent to x:true and -x equivalent to x:false. If the preview option is provided as a marker, it defaults to preview:10.

Workflow

The export workflow is as follows:

Read data
Roll up using hisRollupAuto (if interval is provided)
Interpolate gaps (if interpolate is true)
Apply hisClip
Remove units from numeric data (if removeUnits is true)
Add units to column headers (if headerUnits is true)
Format timestamps
Write CSV output to handle using ioWriteCsv

If preview is set, then step 8 instead returns a preview of the output that would have been written.

Previewing Export

The preview option enables preview mode. In preview mode, csvExportHistory returns a grid previewing the data export but does not write to file. The total number of rows is limited to the numeric value of preview; see options.

Convenience Processing

Range Cleaning

If the clean option is true, values outside the range [minVal, maxVal], as defined by the point's minVal and maxVal tags, are removed from the data set. (Missing minVal and maxVal tags imply permissible minimum and maximum values of negative and positive infinity, respectively.)

If the defVal option is also provided, values removed are replaced with defVal; otherwise, they are replaced with Null. Optionally, defVal may be defined as a tag on each point; defVal point tags override the global option on a per-point basis.

Range cleaning is performed prior to rollup and/or interpolation.

Rollup

If the interval option is provided, data are rolled up at the specified interval using hisRollupAuto. (For maximum compatibility with other software, Boolean points are instead rolled up using durTrue, then normalized by the interval to obtain a fractional value between 0 and 1.) Rollup occurs prior to interpolation.