sfxDataQualityExt

Look for missing or strange values in your data.
sfxDataQualityExt

Registered users on StackHub may elect to receive email notifications whenever new package versions are released.

v1.0.8

Overview

Welcome to SkySpark's Data Quality Extension! As of today, there are 8 data quality functions (7 rule-ready and 1 job-ready) in this package.

sfxDQFindNulls(point, dateRange, duration)

Find Missing Values

  • This rule runs at the point level.
  • Do rollups of the expected interval of the data. Then, look for empty periods.
  • sfxDQTSGap() does the same thing, but is more advanced.
  • Side Effects: None.
  • Input: A point, a date/date range, and an expected interval.
  • Output: hisDurGrid
  • sfxDQFindNulls(read(temp and discharge), pastMonth, 20min)

sfxDQFindOddities(point, dateRange)

Find Oddities

  • This rule runs at the point level.
  • Finds nulls, NAs, or numbers less than or equal to 0.
  • Side Effects: None.
  • Input: A point and a date/date range.
  • Output: hisDurGrid
  • sfxDQFindOddities(read(temp and discharge), pastMonth)

sfxDQFindOutliers(point, dateRange, stdDevs)

Find Outliers Based on Standard Deviation

  • This rule runs at the point level.
  • This rule has been modified to rely on the hisDQAverage tag that comes from the DQJob.
  • sDs is the number of standard deviations you want this rule to allow for before a spark is detected. If a history record is below or above 3 stdDev, it will cause a spark in this form (you can change this).
  • Side Effects: None.
  • Input: A point, a date/date range, and optionally, how many standard deviations to go out.
  • Output: hisDurGrid
  • sfxDQFindOutliers(read(temp and discharge), pastMonth, 4)

sfxDQJob()

Data Quality Job

  • Adds Descriptive Tags to Numeric Points
  • Run this as a job weekly
  • reInitialize tells SkySpark how often to update tags
  • Descriptive tags can be modified by user on the fly or with tuning function.
  • Monthly running yearly total normalizes for many things.
  • Dependencies: Calculus and Signal Analysis Pods.
  • Side Effects: Adds several statistical tags to all numeric points that can be used for complex rules.
  • Input: Optional reInitialize interval and option how far back to go (default is 1mo and pastYear).
  • Output: None.
  • sfxDQJob()

sfxDQOutofRange(point, dateRange, buffer)

Find Outliers

  • This rule runs at the point level.
  • Use tags from data quality job to look for outliers.
  • Side Effects: None.
  • Input: A point, a date/date range, and optional buffer.
  • The buffer determines what percentage of the pastYear's absolute min or max is acceptable.
  • if old: {min: 50, max: 100, buffer: 0.8} then run against: {minThresh: 62.5, maxThresh: 80} (50/0.8 = 62.5 and 100*0.8 = 80)
  • Output: hisDurGrid
  • sfxDQOutofRange(read(temp and discharge), pastMonth, 0.7)

sfxDQRateofChange(point, dateRange, buffer)

Find Rate of Change too High.

  • This rule runs at the point level.
  • sfxDQRateofChange() does the same thing, but is more advanced.
  • Use tags from data quality job to look for rate-of-change outliers.
  • Dependencies: Calculus and Signal Analysis Pods.
  • Side Effects: None.
  • Input: A point, a date/date range, and an optional buffer.
  • The buffer determines what percentage of the pastYear's absolute max negative or max positive rate of change is acceptable.
  • if oldROC: {maxNegROC: -10, maxPosROC: 10, buffer: 0.8} then run against: {maxNegROCThresh: -8, maxPosROCThresh: 8} (-10*0.8 = -8 and 10*0.8 = 8)
  • Output: hisDurGrid
  • sfxDQRateofChange(read(temp and discharge), pastMonth, 0.7)

sfxDQRateofChangeThresh(point, dateRange, threshold)

Find Rate of Change too High.

  • This rule runs at the point level.
  • Use tags from data quality job to look for rate-of-change outliers.
  • Dependencies: Calculus and Signal Analysis Pods.
  • Side Effects: None.
  • Input: A point, a date/date range, and a threshold.
  • Output: hisDurGrid
  • sfxDQRateofChangeThresh(read(temp and discharge), pastMonth, 10)

sfxDQTSGap(point, dateRange, buffer)

Find Gaps in Data.

  • This rule runs at the point level.
  • Use tags from data quality job to look for missing data/intervals.
  • Side Effects: None.
  • Input: A point, a date/date range, and an optional buffer.
  • The buffer determines what percentage of the pastYear's median ts interval is acceptable.
  • if oldTSInt: {medTSInt: 15min, buffer: 1.9} then run against: {medTSIntThresh: 28.5min} (15min*1.9 = 28.5min)
  • Output: hisDurGrid
  • sfxDQTSGap(read(temp and discharge), pastMonth, 1.8)

Please contact [email protected] with any questions.

Published by SkyFoundry

Products & Services by SkyFoundry

Packages by SkyFoundry

Free packages

Pricing options
sfxDataQualityExt
Look for missing or strange values in your data.
FREE
Download now
Also available via SkyArc Install Manager
Package details
Version1.0.8
Licensen/a
Build date7 months ago
on 20th Apr 2018 15:59:50 UTC
Depends on
File namesfxDataQualityExt.pod
File size9.11 kB
MD58cfb797c555a2e6fd568f8623e676cac
SHA1 e5c8c916a8196436ec6ce0331698946d477e6a3c
Published by
SkyFoundryDownload now
Also available via SkyArc Install Manager
Tags
Fantom Pod