Code Monkey home page Code Monkey logo

Comments (5)

lgbouma avatar lgbouma commented on August 24, 2024

PS: thank you for writing this very useful code!

from tls.

hippke avatar hippke commented on August 24, 2024

Hi Luke, sorry you're having trouble.
I don't know what's causing this, but I suspect that the signal is overshadowed by outliers.
When I clean the data like this:

from astropy.stats import sigma_clip
flux = sigma_clip(flux, sigma_upper=3, sigma_lower=3)

Then I get a strong signal with a period of 7.20195 d.
The phase-fold looks like a transit:
Figure_1

This doesn't solve the question, but may be a work-around. Questions to better understand the issue:

  • Can you try more agressive filtering of your data?
  • For how many light curves does this come up?

from tls.

lgbouma avatar lgbouma commented on August 24, 2024

Hi Michael,

Thank you for this response! Responding to your questions:

Can you try more aggressive filtering of your data?
Sure, the filtering can be arbitrarily aggressive... up to the usual point of not wanting to overfit transits!

For how many light curves does this come up?
I've tried running TLS on 4 Kepler light curves so far, and I got this behavior for 2 of them -- it worked for Kepler-1627 and Kepler-1643, and failed for KOI-7368 and KOI-7913. The latter two have the weakest transit signals.

Possible solution

I was intrigued though that sigma clipping helped mitigate the issue. The power spectrum I got after sigma clipping as you suggested looked like the following (I've even added axis labels... where "power" means SDE!)

tls_mwe_sigclip_run1_dy_default

so indeed, the (injected) 7.202 day signal was recovered. As a quick check, I went ahead and tried masking the 7.202 day signal, to see whether the (real) 6.843 day signal would also be recoverable. I again found though that TLS yielded fixed chi^2 for all trial periods -- it failed in the same way. This was surprising to me since in the BLS spectra from my initial message, the 6.843 day signal was the dominant one!

I thought through this a bit more, and eventually formulated the idea that the issue must be tied to the definition of chi^2, and the comparison being done in core.lowest_residuals_in_this_duration between the transit model and a straight line
(permalink). chi^2 only has so many components, so I thought: what if the issue is with the definition of the uncertainties? TLS by default assumes dy = np.std(y). Perhaps this isn't the correct definition for these processed Kepler light curves (the subtraction of stellar rotation signals can yield large residuals which drive up the standard deviation).

This turned out to be a good idea, I think. If I set the uncertainties as say the mean point-to-point RMS across the light curve, I get an SDE spectrum that makes more sense -- a) there's an actual noise floor, and b) both the 7.202 and 6.843 day signals are there. The same procedure actually then successfully finds both signals:

tls_mwe_sigclip_run1_dy_p2p

I think this means this might be "user error" more than a bug with TLS. However I wonder whether it is indeed the case that assuming dy = np.std(y) is the most robust assumption (validate.py, L40). I'll leave this issue open so that you can consider whether you want to change this assumption in a future patch, but feel free to consider this particular case closed. Thank you for your help!

-Luke

PS: here's the updated MWE that finds both the real planet in the KOI-7368 light curve from above, as well as the injected one. The key fix was passing an explicit set of dy values.

import pandas as pd, numpy as np, matplotlib.pyplot as plt
from transitleastsquares import transitleastsquares
import multiprocessing as mp
from astropy.stats import sigma_clip
n_threads = mp.cpu_count()

def p2p_rms(flux):
    dflux = np.diff(flux)
    med_dflux = np.nanmedian(dflux)
    up_p2p = (
        np.nanpercentile( np.sort(dflux-med_dflux), 84 )
        -
        np.nanpercentile( np.sort(dflux-med_dflux), 50 )
    )
    lo_p2p = (
        np.nanpercentile( np.sort(dflux-med_dflux), 50 )
        -
        np.nanpercentile( np.sort(dflux-med_dflux), 16 )
    )
    p2p = np.mean([up_p2p, lo_p2p])
    return p2p

csvpath = 'KOI-7368_npoints50000.csv'
df = pd.read_csv(csvpath)
time, flux = np.array(df.time), np.array(df.flux)

# NOTE: not actually necessary for TLS to converge, but a good idea
# regardless because it lowers the noise floor in the periodogram
flux = sigma_clip(flux, sigma_upper=3, sigma_lower=3)
stime = time[~flux.mask]
sflux = flux[~flux.mask]

dy = np.ones_like(stime)*p2p_rms(sflux)
model = transitleastsquares(stime, sflux, dy=dy, verbose=True)
results = model.power(
    use_threads=n_threads, show_progress_bar=True, R_star=0.88,
    R_star_min=0.85, R_star_max=0.91, M_star=0.88, M_star_min=0.85,
    M_star_max=0.91, period_min=5, period_max=10, n_transits_min=10,
    transit_template='default', transit_depth_min=10e-6, oversampling_factor=5
)

plt.close("all")
plt.plot(results['periods'], results['power'], lw=0.5)
plt.xlabel('period [days]')
plt.ylabel('power')
plt.savefig('tls_mwe_sigclip_run1.png', dpi=400)

from transitleastsquares import transit_mask, cleaned_array

intransit = transit_mask(stime, results.period, 2*results.duration, results.T0)

y_second_run = sflux[~intransit]
t_second_run = stime[~intransit]
t_second_run, y_second_run = cleaned_array(t_second_run, y_second_run)

dy = np.ones_like(t_second_run)*p2p_rms(y_second_run)
model2 = transitleastsquares(t_second_run, y_second_run, dy=dy, verbose=True)
results2 = model2.power(
    use_threads=n_threads, show_progress_bar=True, R_star=0.88,
    R_star_min=0.85, R_star_max=0.91, M_star=0.88, M_star_min=0.85,
    M_star_max=0.91, period_min=5, period_max=10, n_transits_min=10,
    transit_template='default', transit_depth_min=10e-6, oversampling_factor=5
)

print(results2['period'])

plt.close("all")
plt.plot(results2['periods'], results2['power'], lw=0.5)
plt.xlabel('period [days]')
plt.ylabel('power')
plt.savefig('tls_mwe_sigclip_run2.png', dpi=400)

from tls.

hippke avatar hippke commented on August 24, 2024

Hey, that's a very cool find. I totally agree that the issue is non-Gaussian errors. We could add a feature to TLS to calculate uncertainty in a more sophisticated way, and move away from std in case errors are non Gaussian. What is the method p2p_rms in your script? What would you recommend to determine non-Gaussian uncertainty?

from tls.

lgbouma avatar lgbouma commented on August 24, 2024

I can imagine two alternatives (though I'm sure there are more!).

The simpler thing is to assume that the non-Gaussianity is mostly driven by a few outliers. If so, the standard-deviation will be driven up by these outliers, but a statistic like the 84th-50th percentile, or its lower converse, will not.

Another way to produce non-Gaussianity is if the user hasn't even attempted to remove a coherent signal that is present in the light curve (e.g., few-day stellar rotation, or scattered light for TESS at the edges of orbits). The point-to-point RMS statistic I quoted above is robust to these kinds of trends in the data too, because it works on the point-to-point difference time-series, which removes such trends.

Here's the output of an example code comparing these statistics against the standard deviation:

Seed 42
Gaussian errors (σ=0.05):
P2P RMS: 0.0704, STDEV: 0.0489, 68 percentile: 0.0481
Gaussian errors + outliers:
P2P RMS: 0.0806, STDEV: 0.2176, 68 percentile: 0.0520
Gaussian errors + outliers + line:
P2P RMS: 0.0806, STDEV: 0.2810, 68 percentile: 0.2156

Code that generates this is as follows.

import numpy as np

def sixtyeight(flux):
    """
    Calculate the 68th percentile of the distribution of the flux.
    """
    med_flux = np.nanmedian(flux)

    up_p2p = (
        np.nanpercentile( flux-med_flux, 84 )
        -
        np.nanpercentile( flux-med_flux, 50 )
    )
    lo_p2p = (
        np.nanpercentile( flux-med_flux, 50 )
        -
        np.nanpercentile( flux-med_flux, 16 )
    )

    p2p = np.mean([up_p2p, lo_p2p])

    return p2p

def p2p_rms(flux):
    """
    Calculate the 68th percentile of the distribution of the residuals from the
    median value of δF_i = F_{i} - F_{i+1}, where i is an index over time.
    """
    dflux = np.diff(flux)
    med_dflux = np.nanmedian(dflux)

    up_p2p = (
        np.nanpercentile( dflux-med_dflux, 84 )
        -
        np.nanpercentile( dflux-med_dflux, 50 )
    )
    lo_p2p = (
        np.nanpercentile( dflux-med_dflux, 50 )
        -
        np.nanpercentile( dflux-med_dflux, 16 )
    )

    p2p = np.mean([up_p2p, lo_p2p])

    return p2p

for seed in [42, 3141]:

    print(42*'=')
    print(f"Seed {seed}")

    np.random.seed(seed)
    n_points = 1000
    scale = 0.05

    err = np.random.normal(loc=0, scale=scale, size=n_points)
    y = np.ones(n_points) + err

    print(f"Gaussian errors (σ={scale}):")
    print(f"P2P RMS: {p2p_rms(y):.4f}, STDEV: {np.std(y):.4f}, 68 percentile: {sixtyeight(y):.4f}")

    outlier_ind = np.unique(np.random.randint(0, n_points, size=50))
    outlier_val = np.random.uniform(low=-0.1, high=0.1, size=len(outlier_ind))

    y[outlier_ind] = outlier_val

    print("Gaussian errors + 5% outliers:")
    print(f"P2P RMS: {p2p_rms(y):.4f}, STDEV: {np.std(y):.4f}, 68 percentile: {sixtyeight(y):.4f}")

    line = np.linspace(-0.3,0.3,n_points)
    y += line

    print("Gaussian errors + 5% outliers + line:")
    print(f"P2P RMS: {p2p_rms(y):.4f}, STDEV: {np.std(y):.4f}, 68 percentile: {sixtyeight(y):.4f}")

from tls.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.