Code Monkey home page Code Monkey logo

stanford-solar-forecasting-dataset's Introduction

SKIPP'D — a SKy Images and Photovoltaic Power Generation Dataset for Short-term Solar Forecasting

Note: This README file is for demonstration purpose. For details of the dataset please refer to our dataset paper. All datasets are licensed under a Creative Commons Attribution 4.0 International License. All code files are licensed under the MIT license (see LICENSE).


Dataset Paper | Benchmark Dataset | Raw Dataset

sunnygif_1 cloudygif_1 sunnygif_2 cloudygif_2 sunnygif_3 cloudygif_3 sunnygif_4 cloudygif_4 sunnygif_5 cloudygif_5 sunnygif_6 cloudygif_6

Large-scale integration of photovoltaics (PV) into electricity grids is challenged by the intermittent nature of solar power. Sky image-based solar forecasting has been recognized as a promising approach to predicting the short-term fluctuations.

Here, we present SKIPP'D — a SKy Images and Photovoltaic Power Generation Dataset for short-term solar forecasting, collected and compiled by the Environmental Assessment and Optimization (EAO) Group at Stanford University. We hope that this dataset will facilitate the research of image-based solar forecasting using deep learning and contribute to a standardized benchmark for evaluating and comparing different solar forecasting models. We also encourage the users to explore on other related areas with this dataset, such as sky image segmentation, cloud type classification and cloud movement forecasting.

Any questions regarding the dataset can be directed to Yuhao Nie ([email protected]).

Updates Log

2024.01.21   The codes for SkyGPT (Generative AI for future sky image synthesis and probabilistic solar forecasting) are open sourced and available in GitHub.
2024.01.08   The codes for the cloud detection algorithm in the sky-condition-specific submodel paper is now open sourced and can be accessed in this GitHub Repo.

2023

2023.06.20   SkyGPT paper on stochastic sky video prediction for probabilistic solar forecasting is available on arXiv.
2023.03.21   SKIPP'D dataset paper is accepted by Solar Energy.

2022

2022.11.27   Survey paper on open-source ground-based sky image datasets is available on arXiv.
2022.11.03   Transfer learning paper based on SKIPP'D and other two datasets is available on arXiv.
2022.07.05   Dataset paper is available on arXiv.
2022.07.01   SKIPP'D v1.0 releases, including 2017-2019 benchmark and raw data collected at Stanford campus.

Code Base and Dependencies

All the codes are writen in Python 3.6.1. The deep learning models are implemented using deep learning framework TensorFlow 2.4.1 and trained on GPU cluster, with NVIDIA TESLA V100 32GB or A100 40GB card. TensorFlow 2.4.1 is compatible with CUDA 11.2.0 and cuDNN 8.1.1.33. All dependencies are listed in requirements.txt.

File Description
data_processing/data_preprocess_snapshot_only.ipynb Jupyter Notebook used to capture images from the video stream at designated frequency.
data_processing/data_preprocess_pv.ipynb Jupyter Notebook used to process the raw PV power generation history.
data_processing/data_nowcast.ipynb Jupyter Notebook used to down-sample the image frames, filter out the invalid frames and match images with the concurrent PV data, and partition model development and testing sets.
data_processing/data_forecast.ipynb Jupyter Notebook used to generate valid samples for the forecast task.
models/SUNSET_nowcast.ipynb Jupyter Notebook used to create the SUNSET nowcast model to correlate PV output to contemporaneous images of the sky, including model training, validation and testing.
models/SUNSET_forecast.ipynb Jupyter Notebook used to create the SUNSET forecast model to predict 15-min ahead minutely-averaged PV output, including model training, validation and testing.
models/Relative_op_func.py Helper functions for calculating theoretical PV power output under clear sky condition and the clear sky index.

Dataset Description

The dataset contains the following two levels of data which distinguishes it from most of the existing open-sourced solar forecasting datasets and makes it especially suitable for deep-learning-based solar forecasting research:

  1. Benchmark dataset: 3 years of processed sky images (64×64) and concurrent PV power generation data with 1-min interval that are ready-to-use for deep learning model development;

  2. Raw dataset: Overlapping high resolution sky video footage (2048×2048) recorded at 20 frames per second, sky image frames (2048×2048) and history PV power generation data logged in 1-min frequency that suit various research purposes.

In addition, we provide the code base of data processing and baseline model implementation for researchers to fast reproduce our previous work and accelerate solar forecasting research.

The benchmark data is available at https://purl.stanford.edu/dj417rh1007 and the raw data is deposit separately by each year given its large size. The 2017 raw data is available at https://purl.stanford.edu/sm043zf7254 and the links to 2018 and 2019 data can be found in the "Related items" elsewhere on the same web page. The data files are summarized below.

File Type Description
2017_2019_images_pv_processed.hdf5 Benchmark data A file-directory like structure consisting of two groups: "trainval" and "test", for storing model development set and test set, respectively, with each group containing two datasets: "images_log" and "pv_log", which stores the processed images and PV generation data from all three years (2017-2019) in Python NumPy array format.
times_trainval.npy Benchmark data Python NumPy array of time stamps corresponding to development set in .hdf5 file.
times_test.npy Benchmark data Python NumPy array of time stamps corresponding to test set in .hdf5 file.
{Year}_{Month}_videos.tar Raw data Tar archives with daytime 2048x2048 sky videos (.mp4) recorded at 20 frames per second for each month from 2017/03 to 2019/12.
{Year}_{Month}_images_raw.tar Raw data Tar archives with daytime 2048 $\times$ 2048 sky images (.jpg) captured at 1-min intervals for each month from 2017/03 to 2019/12 (around 7 GB of each month).
{Year}_pv_raw.csv Raw data One-min PV generation data for the year 2017, 2018 and 2019.

Dataset Sources

Our research group started the data collection from March, 2017 at Stanford University campus, located in the center of the San Francisco Peninsula, in California. According to the Köppen climate classification system, Stanford has a warm-summer Mediterranean climate, abbreviated Csb (C=temperate climate s=dry summer b=warm summer) on climate maps. In terms of cloud coverage, Stanford is featured by long summers with mostly clear sky and short winters with partly cloudy sky.

Two major categories of data are collected and logged: sky images and PV power generation. Data are recorded according to their internal clocks synchronized with the local time zone, which is Pacific Standard Time (PST), to ensure consistency. Over the past five years, our lab has collected over 3 terabytes of data. In this release, we open-source the data from 2017 March to 2019 December1. Here, we provide two levels of data to suit the different needs of researchers: (1) A processed dataset consists of 1-min down-sampled sky images (64x64) and PV power generation pairs, which is intended for fast reproducing our previous work and accelerating the development and benchmarking of deep-learning-based solar forecasting models; (2) A raw dataset consists of high resolution sky images (2048x2048) and PV power generation data, as well as the source sky video footage, which is intended for customizing data extraction, and exploring other related areas of solar forecasting such as cloud segmentation and cloud movement forecasting.

In a future release, we will open source the data from 2020 and beyond of the Stanford dataset and include two additional data sources: sky images and PV power generation data from a solar farm in Oregon collected by our research group and sky images from cameras set up by NREL which correspond to solar irradiance data collected by them. The update information will be released in this GitHub repository.

Sky Images

Video recordings of the daytime sky (6:00 AM ~ 8:00 PM PST) are shot with a 6-megapixel 360-degree fish-eye camera (Hikvision DS-2CD6362F-IV2), which is located on top of the Green Earth Sciences Building at Stanford University and oriented towards 14° south by west. Camera aperture, white balance and dynamic range are held constant. Videos are captured in a resolution of 2048 × 2048 pixels at 20 frames per second (fps) and images (.jpg) are extracted from the video at 1-min sampling frequency. Figure 1 gives examples of sky images in different weather conditions, and shows the camera and PV panels used in this study.

User can extract higher frequency image samples and down-size the samples to a lower resolution based on their needs. For your reference, our previous research work [1] shows 1-min frequency and 64 x 64 resolution to be acceptable for PV output forecast, while retaining reasonable training time.

Figure 1: Photos of sky images and research equipment. (A. Sky image captured on a clear day at 12:18:20 pm, January 25, 2019. B. Sky image of a cloudy day captured at 12:32:10 pm May 27, 2019. C. Fish-eye camera used for sky imaging. D. Studied PV panels. E. Locations of the camera and studied solar panels)

PV power generation

The PV power generation data are collected from solar panel arrays ∼125 m away from the camera, on the top of the Jen-Hsun Huang Engineering Center at Stanford University. The poly-crystalline panels are rated at 30.1 kW-DC, with an elevation and azimuth angle at 22.5° and 195°, respectively. The raw PV output power data are logged with 1-min frequency and representing the average power output within that minute3.

Data Processing

For flexibility of research, we open source high-resolution, high-frequency raw data, and the users of this dataset can process the data based on their own needs. We provide some reference codes for data processing in directory \data_processing, which basically including the following steps:

  1. Snapshotting the video footage at a designated frequency (data_preprocess_snapshot_only.ipynb)
  2. Processing the raw PV output history (data_preprocess_PV.ipynb)
    • Interpolation of PV data to every 10 seconds (in preparation for matching with images with irregular time stamps, e.g., 08:20:40)
    • Filtering out the invalid PV data (missing record>1 hr or PV data<0)
  3. Processing images and matching images with the concurrent PV data (data_preprocess_nowcast.ipynb)
    • Down-sizing the image frames
    • Filtering out repeating images caused by the occasional abnormal behavior of OpenCV video capture function
  4. Generating valid samples for the forecast task (which will be described in the second use case below) and partitioning training, validation and testing sets (data_preprocess_forecast.ipynb)

Users can either use the reference codes we provided here or customize their own data processing pipeline. For more details, please refer to the data processing section of this dissertation [6].

Benchmark dataset

The benchmark dataset contains the model development set and test set obtained from the data processing Step 3 described in the above section. The samples of the benchmark dataset are organized as aligned pairs of sky images and PV power generation. Figure 2 shows the distribution of the PV power generation data for the development set and test set and the PV power generation profiles of the 20 days in the test set.

benchmark data

Figure 2: The PV power generation data distribution of the benchmark dataset: A. development set PV data distribution; B. test set PV data distribution; and C. the PV power generation profiles of the 10 sunny days and 10 cloudy days used in the test set: upper panel shows for the sunny days, and the lower panel is for the cloudy days.

Demonstration of Use Cases

Here, we demontrates a few use cases of the dataset based on our previously published works. Our group has developed a specialized convolutional neural network model named SUNSET (Stanford University Neural Network for Solar Electricity Trend) for PV output forecast. Two specific prediction tasks were investigated based on SUNSET, including (1) PV power generation nowcast [2], i.e., given a sky image, predicting the contemporaneous PV output; and (2) PV power generation forecast [1], given sky images and PV output for the past 15 minutes on 1-minute resolution, predicting PV output 15 minutes ahead into the future. The details of these two models can be found in the corresponding published papers. It should be noted that the results shown below are based on the results from our previous publications for demonstration purpose, for results based on the benchmark dataset, please refer to our dataset paper. As described in the paper, we implemented these two deep learning models using TensorFlow 2.x, which is an update from the TensorFlow 1.x code base used in our previous publications. The new code base can be found in the directory \models. The old code base can be found in our SUNSET Model GitHub repository.

Solar Power Nowcast

We explore convolutional neural networks (CNN) to correlate PV output to contemporaneous images of the sky (a “now-cast”). We demonstrate that sky images are useful in inferring PV panel output, and CNN is a suitable structure in this application. Parts of the results are shown in Figure 3 and Figure 4 and you can refer to [2] for the detailed work.

nowcast sunny

Figure 3: Sample results for solar nowcasting on sunny days

nowcast cloudy

Figure 4: Sample results for solar nowcasting on cloudy days

Short-term Solar Power Forecast

We extend the “now-cast” work and proposed a specialized convolutional neural network (CNN) “SUNSET” to predict 15-min ahead minutely-averaged PV output. The model is characterized by its usage of hybrid input, temporal history and strong regularization. Parts of the results are showed in Figure 5 and Figure 6 and you can refer to [1] for the detailed work.

forecast sunny

Figure 5: Sample results for solar forecasting on sunny days

forecast cloudy

Figure 6: Sample results for solar forecasting on cloudy days

Sun Tracking and Clouds Detection

We utlize a camera projection model to correlate the sun position in a sky image with solar azimuth and zenith angle in the real world, and we develop a modified NRBR threshold with the background subtraction method to determine whether a pixel in the sky image is cloud pixel. In Figure 7, we demonstrate the sun tracking and cloud detection algorithms we developed. You can refer to [3] for more details. The codes are open sourced and can be accessed in this GitHub Repo.

Figure 7: Sample results for sun tracking and clouds detection (red spots in the 1st row of images represent the sun location, and green shades in the 2nd row of images represent the identified cloud pixels)

Summary of Relevant Publications

So far, we have published the following 8 papers (including journal/conference articles and pre-prints) based on the dataset, and more research works are going on.

  1. Solar Nowcasting [2]

  2. Short-term Solar Forecasting [1, 9]

  3. Data Fusion [4]

  4. Sky-condition-specific Sub-models for Solar Forecasting [3]

  5. Resampling and Data Augmentation for Solar Forecasting with an Imbalanced Sky Image Dataset [5]

  6. Transfer Learning for Solar forecasting Based on Multi-location Data [7]

  7. Survery of Open-source Ground-based Sky Image Datasets [8]

  8. Generative AI for future sky image synthesis and probabilistic solar forecasting [10]

Access Instruction

SKIPP'D can be accessed without hassle. The benchmark data is available at https://purl.stanford.edu/dj417rh1007 and the raw data is deposit separately by each year given its large size. The 2017 raw data is available at https://purl.stanford.edu/sm043zf7254 and the links to 2018 and 2019 data can be found in the "Related items" elsewhere on the same web page.

Citation

If you find SKIPP'D useful to your research, please cite:

Nie, Y., Li, X., Scott, A., Sun, Y., Venugopal, V., & Brandt, A. (2023). SKIPP’D: A SKy Images and Photovoltaic Power Generation Dataset for short-term solar forecasting. Solar Energy, 255, 171-179.

or

@article{nie2023skipp,
  title={SKIPP’D: A SKy Images and Photovoltaic Power Generation Dataset for short-term solar forecasting},
  author={Nie, Yuhao and Li, Xiatong and Scott, Andea and Sun, Yuchi and Venugopal, Vignesh and Brandt, Adam},
  journal={Solar Energy},
  volume={255},
  pages={171--179},
  year={2023},
  publisher={Elsevier}
}

Collaboration on the Dataset

Our utlimate goal is to build a centralized large-scale sky image and PV output/irradiance measurements dataset for solar forecasting, just like ImageNet for computer vision research. This large-scale dataset is expected to include data streams coming from all over the world and cover a wide range of climate conditions, thus calling on a joint effort from the community. If you would like to collaborate on building such a dataset, please reach out directly to the PI Adam Brandt ([email protected]).

Some of our ongoing efforts include: (1) continuing the data collection at Stanford Campus; (2) having a new data stream from Oregon (Stanford North) with the same camera set up; and (3) webscraping 1-min high-res sky images from NREL Solar Radiation Research Laboratory which are open-sourced but not archived 4 (Stanford East).

Acknowledgements

The authors thank Stanford Utility for giving us permission to accessing the PV power generation history and Jacques de Chalendar from Stanford University who help us access the data. The authors would also like to acknowledge the Stanford Research Computing Center for providing the computational resources for conducting the experiments in this study. The authors are also grateful to Amy Hodge from Science and Engineering Resource Group at Stanford Libraries for facilitating the datasets depositing.

References

[1] Sun, Y., Venugopal, V., Brandt, A.R., 2019. Short-term solar power forecast with deep learning: Exploring optimal input and output configuration. Sol. Energy 188, 730–741.

[2] Sun, Y., Szűcs, G., Brandt, A.R., 2018. Solar PV output prediction from video streams using convolutional neural networks. Energy Environ. Sci. 11, 1811–1818.

[3] Nie, Y., Sun, Y., Chen, Y., Orsini, R., Brandt, A., 2020. PV power output prediction from sky images using convolutional neural network: The comparison of sky-condition-specific sub-models and an end-to-end model. J. Renew. Sustain. Energy 12, 046101.

[4] Venugopal, V., Sun, Y., Brandt, A.R., 2019. Short-term solar PV forecasting using computer vision: The search for optimal CNN architectures for incorporating sky images and PV generation history. J. Renew. Sustain. Energy 11, 066102.

[5] Nie, Y., Zamzam, A.S., Brandt, A., 2021. Resampling and data augmentation for short-term PV output prediction based on an imbalanced sky images dataset using convolutional neural networks. Sol. Energy 224, 341–354.

[6] Sun, Y., 2019. Short-term Solar Forecast Using Convolutional Neural Networks with Sky Images. Stanford University.

[7] Nie, Yuhao, et al. "Sky-image-based solar forecasting using deep learning with multi-location data: training models locally, globally or via transfer learning?." arXiv preprint arXiv:2211.02108 (2022).

[8] Nie, Yuhao, et al. "Open-Source Ground-based Sky Image Datasets for Very Short-term Solar Forecasting, Cloud Analysis and Modeling: A Comprehensive Survey." arXiv preprint arXiv:2211.14709 (2022).

[9] Sun, Yuchi, Vignesh Venugopal, and Adam R. Brandt. "Convolutional neural network for short-term solar panel output prediction." 2018 IEEE 7th World Conference on Photovoltaic Energy Conversion (WCPEC)(A Joint Conference of 45th IEEE PVSC, 28th PVSEC & 34th EU PVSEC). IEEE, 2018.

[10] Nie, Yuhao, Eric Zelikman, Andea Scott, Quentin Paletta, and Adam Brandt. "SkyGPT: Probabilistic Short-term Solar Forecasting Using Synthetic Sky Videos from Physics-constrained VideoGPT." arXiv preprint arXiv:2306.11682 (2023).

Footnotes

  1. The dataset suffer from some interruptions due to the water intrusion, wiring and/or electrical failure of the camera as well as daylight-saving adjustment failure in 2017 and 2018, which is back to normal for 2019 and beyond.

  2. The camera model Hikvision DS-2CD6362F-IV is discontinued and is replaced by a new model Hikvision DS-2CD6365GOE-IVS. We replace the old model with the new model on April 29, 2022 due to aging.

  3. It should be noted that this is different from the instaneous raw PV data that we used in our previous published works [1], [2], [3] and [4], so users do not need to take rolling average during data processing to get the minutely average data.

  4. NREL only archives sky images in 10-min frequency and open sources to the public.

stanford-solar-forecasting-dataset's People

Contributors

5summer avatar ascott-20 avatar yuhao-nie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

stanford-solar-forecasting-dataset's Issues

About Research Equipment.

Hello, I would like to ask if you have used the Hikvision DS-2CD6362F-IV2, a 6-megapixel camera, in your paper. However, in my country, it seems that only 5-megapixel cameras are available,Hikvision DS-zXA3956F.Can I use the data captured by such a camera with your code?Can it withstand outdoor environments?

The dataset for SUNSET-forecast

Thanks for your great work. However, I have some trobule when I try to reproduce the SUNSET-forecast model. I cann't find the file 'forecast_dataset.h5py' for this model. I also cannot find the files 'all_times_highfreq.npy' 'all_images_highfreq.npy' and 'pv_output_valid.pkl'. How can I get these files?

Cloudiness Information

H @yuhao-nie and @ascott-20, in your paper table 5.1 you evaluate your models separately on cloudy and sunny days. However, this information is not natively included in your dataset to be downloaded. There is information in the preprocessing jupyter notebooks about cloudy and sunny days. However, when I do the following for the forecast task:

sunny_day = [(2017,9,15),(2017,10,6),(2017,10,22),(2018,2,16),(2018,6,12),(2018,6,23),(2019,1,25),(2019,6,23),(2019,7,14),(2019,10,14)]
cloudy_day = [(2017,6,24),(2017,9,20),(2017,10,11),(2018,1,25),(2018,3,9),(2018,10,4),(2019,5,27),(2019,6,28),(2019,8,10),(2019,10,19)]

sunny_datetime = [datetime.datetime(day[0],day[1],day[2]) for day in sunny_day]
cloudy_datetime = [datetime.datetime(day[0],day[1],day[2]) for day in cloudy_day]

arr = np.load("times_test_forecast.npy", allow_pickle=True)
date_arr = [val.date() for val in arr]
sunny_arr = [val.date() for val in sunny_datetime]
cloudy_arr = [val.date() for val in cloudy_datetime]

print(set(date_arr).intersection(set(sunny_arr)))
print(set(date_arr).intersection(set(cloudy_arr)))

The intersection with test forecasting dates and sunny dates is empty, suggesting there are no sunny test dates, only cloudy ones. However, you are reporting values for those in your paper.

For the nowcasting task, the above snippet yields, that the number of cloudy and sunny examples is about equal which is to be expected I guess. Could you help me out what I am missing?

Edit:
The times_test_forecast.npy file is generated from running my script in #3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.