Estimation of annual maximum rainfall intensity

Plotted the Intensity-Duration-Frequency (IDF) curves by first estimating annual maximum intensity for 1-h, 2-h, 4-h, 8-h, 12-h and 24-h duration storms of return periods 2-, 5-, 10-, 50- and 100-years. A equation is also fitted to that plot.
Rainfall_Data.xls contains hourly rainfall data (in mm) for 99 calendar years.

Introduction

Intensity-Duration-Frequency (IDF) curves describe the relationship between rainfall intensity, rainfall duration, and return period. IDF curves are commonly used in the design of hydrologic, hydraulic, and water resource systems. IDF curves are obtained through frequency analysis of rainfall observations.

Methodology

First and foremost, task is to find the annual maximum rainfall intensity for specific durations (or the annual maximum rainfall depth over the specific durations). Durations used for the design applications are: 1-hr, 2-hr,4-hr, 8-hr, 12-hr, and 24-hr.
The development of IDF curves requires that a frequency analysis be performed for each set of annual maxima, one each associated with each rain duration. The basic objective of each frequency analysis is to determine the exceedance probability distribution function of rain intensity for each duration. For the frequency analysis fit a theoretical Extreme Value (EV) distribution (e.g., Gumbel Type I) to the observations and then use the theoretical distribution to estimate the rainfall events associated with given exceedance probabilities.

Result

Results and plotted graphs are in 170492.pdf file.

Hi, dont use pandas for looping, try this.

Hi, I found your repo very helpful. I just have some minors suggestions.

First, the Rainfall Data you star with is unconventional, so I proposed to you a simple pandas based function to take a date, value to pass from long format to matrix format.

This assumes that the rainfall data has an smaller than hour granularity.

def long_to_matrix_format(csv_path):
    """
    Convert a long format CSV with 'datetime' and 'value' columns
    to a matrix format with 'Year', 'Day', and hourly columns.

    Parameters:
    - csv_path: str, path to the long format CSV

    Returns:
    - matrix_df: DataFrame in matrix format
    """

    # Read the CSV
    df = pl.read_csv(csv_path, try_parse_dates=True).to_pandas(())
    
# Convert the 'datetime' column to a pandas datetime format
    df["date"] = pd.to_datetime(df["date"])
    
# Set the DateTime column as the index
    df.set_index("date", inplace=True)
    df = df.resample("60min").sum()
    df["date"] = df.index.to_series()


    # Extract year, day of year, and hour
    df["Year"] = df["date"].dt.year
    df["Day"] = df["date"].dt.dayofyear
    df["Hour"] = (
        df["date"].dt.hour + 1
    )  # +1 to start hours from 1 to 24 instead of 0 to 23

    # Pivot the DataFrame to the desired matrix format
    matrix_df = df.pivot_table(
        index=["Year", "Day"], columns="Hour", values="value", aggfunc="sum"
    ).reset_index()

    # Rearrange columns to the desired order
    matrix_df = matrix_df[["Year", "Day"] + [i for i in range(1, 25)]]
    matrix_df.columns = ["Year", "Day"] + [f"Hour {i}" for i in range(1, 25)]

    # Fill NaN values with 0
    matrix_df.fillna(0, inplace=True)

    matrix_df["Year"] = matrix_df["Year"] - matrix_df["Year"].min() + 1

    return matrix_df.to_numpy().astype(float).copy()#this is important to make use of the numba code.

#-----------------------------------------

Then, the for loop you use to get the data1 dataframe is very slow, because you are using pandas vectorized functions on a (i, j) basis. I have a minor suggestion for this. do it pure python and then compile it in numba.

@numba.njit('float64[:,::1](float64[:,::1])')
@cc.export('nb_get_idf_rolling_sum', 'float64[:,::1](float64[:,::1])')
def get_idf_rolling_sum(df_array):
    rows = np.unique(df_array[:, 0]).size
    data = np.zeros((rows, 24))

    for i in range(rows):
        # Filter by year and drop the 'Year' and 'Day' columns
        df1_array = df_array[df_array[:, 0] == i + 1][:, 2:].T

        # Loop through 0 to 23 to calculate the max rolling sum for each window size
        for j in range(24):
            max_rolling_sum = np.NINF

            # Loop through each column
            for col in range(df1_array.shape[1]):
                # Loop through each possible window in the current column
                for row in range(df1_array.shape[0] - j):
                    rolling_sum = np.sum(df1_array[row : row + j + 1, col])

                    # Update max rolling sum if this sum is greater
                    if rolling_sum > max_rolling_sum:
                        max_rolling_sum = rolling_sum

            # Store the max rolling sum in the data array
            data[i, j] = max_rolling_sum

    return data

The numba function replaces this part.
---> for i in range(99):
df1 = (((df.where(df['Year']==i+1)).dropna()).drop(['Year','Day'],axis=1)).T
for j in range(24):
data[i][j] = max((df1.rolling(j+1).sum()).max())

#-------------------
This a time test for a 10 millon rows of a csv file of the long format meaning:

date,value
1981-01-01 01:00:00, 2.5
1981-01-01 02:00:00, 2.5
1981-01-01 03:00:00, 2.5
1981-01-01 04:00:00, 2.5
.
.
.
nth 10 million

#convert long format to matrix format
t0 = 0.5562264919281006

#proces roliing sum
t1 = 0.024176836013793945

This two functions use polars, numpy, numba and pandas.

I would be nice if you turn this into a python package maybe adding a couple of features, there are not many python packages for IDF curves construction.

Have a nice day,
Marcelo.

pcgotan / rainintensity Goto Github PK

rainintensity's Introduction

Estimation of annual maximum rainfall intensity

Introduction

Methodology

Result

rainintensity's People

Contributors

Stargazers

Watchers

Forkers

rainintensity's Issues

Hi, dont use pandas for looping, try this.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent