Hi, I am using PRESTO to process some data and look for single pulses. The files have

Prepsubband writes less samples than original file has. about presto HOT 6 OPEN

Newtonlml commented on August 16, 2024

Prepsubband writes less samples than original file has.

from presto.

Comments (6)

scottransom commented on August 16, 2024

Yes, this is expected behavior. And the quick reason why is that we are missing some data to be able to output the full-duration of the data because you are de-dispersing.

Since you are dedispering to a positive DM, you are assuming that the low-frequency data is arriving later than the high-frequency data. That's fine at the beginning of the observation where we can simply grab later samples at lower freqs to add them to the initial sample at the highest freq. But that doesn't work at the end of the observation where we have no low freq data recorded! So since we don't know what to add in order to properly dedisperse, we simply ignore parts of the original file where we don't have all frequencies we need.

This problem gets worse and worse (meaning you lose a bigger and bigger fraction of your input data) when 1) the observation duration is relatively short and 2) when the total amount dispersion (in time) is long between the top and bottom of the observing band.

In your case, for a DM of 300, the differential delay across the band (1200-1800MHz) is 0.48 seconds, and so you will always lose at least that amount of data.

However, there is another issue in that prepsubband (and prepdata) work on blocks of data of (by default) 2400 samples (see the Spectra/Subint number in the output) for filterbank data. So you will often get an integer number of samples of that duration out.

And finally, both of those commands automatically choose a highly factorable number of data points to output so that FFTs on that data will be efficient.

For your initial case you are seeing all of these effects. You can mitigate some of this (but you can't get back data you don't have because of dispersion) by specifying the number of output data points to use with the -numout option. If you specify -numout to equal the number of input data points, trying looking at the end of the resulting .dat file with the exploredat command and you will see how the noise in the data changes as we have less and less data available to do the full and proper dedispersion.

Hope this helps!

from presto.

Newtonlml commented on August 16, 2024

Thank you very much for your response.

I understood your explanation and it makes sense. I tried your suggestion of looking into the .dat file but I don't see anything weird. Is there a way to predict how much data I will lose? Because like you said, with the DM and frequency range I can obtain the least amount of data that I will lose, in the example you mentioned it was ~0.5 seconds but I am losing around a minute of data. I assume this is due to the integer number of blocks that you mentioned. When executing prepsubband it says that for my original number of samples the "good number of samples" to work with is 30240 but the data written only gets to 24000. Is there a way to know how many blocks I will lose?

from presto.

scottransom commented on August 16, 2024

Have you tried specifying -numout? And where does it say "30240"?

from presto.

Newtonlml commented on August 16, 2024

Yes I have tried specifying -numout to be equal to the original samples.
I changed the PRESTO version to a more recent one so now the outputs look like this:
-prepsubband -nobary -lodm 300 -dmstep 1 -numdms 1 -downsamp 1 -nsub 2048 -runavg -numout 29868 -o prep_output 2022-09-10_11_15_25.404305.fil

And this is without the -numout (where the 30240 comes from):

As you can see, the data points written in both cases is 24000 and is padding the rest, here is a screenshot of exploredat of whren the -numout is specified but it looks the same when is not specified.

from presto.

scottransom commented on August 16, 2024

One other request for you to try: can you try using prepdata to make a single time series rather than prepsubband? The latter is quite a bit more complicated (since it allows you to output mutiple time series at once), so there might be a bit of extra overhead there.

from presto.

Newtonlml commented on August 16, 2024

I tried prepdata and it doesn't remove as much samples as prepsubband. Specifying -numout or not, it writes 28800 samples instead of the 24000 from prepsubband.

This amount of data lost makes more sense giving the amount of samples per data block and the least amount of data I should lose in the process. So it seems that prepsubband may be removing more samples in the process to optimize the output of many time series? In that case prepdata seems to be the best command for my case.
Thank you very much for your help :)

from presto.

Prepsubband writes less samples than original file has. about presto HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent