Code Monkey home page Code Monkey logo

ftools's People

Contributors

reifjulian avatar sergiocorreia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ftools's Issues

join does not clear sortedby macro

With the following code

set seed 42
clear
set obs 20
gen x = _n
gen r = runiform()
drop if r > 0.5
sort r
drop r
tempfile x
save "`x'"

clear
set obs 20
gen y = _n
drop if runiform() > 0.5
sort y
join, by(y=x) from("`x'")
desc
disp "`:sortedby'"
l

shows the data is sorted by y. However, this is only the case for master/matched observations, which appear first and sorted. Unmatched using observations appear last and unsorted. Ideally sortedby would get cleared after join is run if the results will not be sorted.

  • which join gives *! version 2.36.1 13feb2019
  • which ftools gives *! version 2.42.0 28dec2020

join: assert that by() vars have same general type (str vs num)

EG:


key variable id1 is str5 in master but byte in using data
key variable id3 is str12 in master but int in using data
    Each key variable -- the variables on which observations are matched -- must be of the same generic type
    in the master and using datasets.  Same generic type means both numeric or both string.

Error when merging m:1 on string variable

set obs 100
gen a = string(_n)
tempfile temp
save `temp'
fmerge m:1 a using `temp'

returns

assert_is_id():  3498  <a> do not uniquely identify obs. in the using data
                  join():     -  function returned error
                 <istmt>:     -  function returned error

fegen group behaviour with missing grouping values

Firstly, thanks for this excellent set of programs.

. which ftools
/home/anon/ado/plus/f/ftools.ado
*! version 2.9.2 06apr2017

Suppose we have the data set:

hhid	pid
1	1
2	1
2	1
2	
3	1
3	2
4	
4	

Using egen group will return a missing group value if the grouping values have a missing value. Using fegen group does not distinguish between missing and non-missing values. So, running,

fegen group1 = group(hhid pid)
egen group2 = group(hhid pid)

produces the data set:

hhid	pid	group1	group2
1	1	1	1
2	1	2	2
2	1	2	2
2		3	
3	1	4	3
3	2	5	4
4		6	
4		6	

which may cause problems if the users program expects the same behaviour as the stata command.

Best, Andrew

Dict size exceeds Mata limits?

Hi Sergio,
I'm running into a dict size exceeds Mata limits error running reghdfe on a pretty large dataset, and found this error built into the ftools .ado file.

According to the Stata documentation for the release of Stata 16, though, "Mata matrices remain limited only by memory," so I was wondering if there's a remaining reason for the hard-coded limit in place here?

Thanks!

Command join is unrecognised

Hi,

first of all, thanks a lot for both ftools and reghdfe! These are superb tools...

I have an issue using Stata 14 and using fmerge: I get an error r(199) stating "command join is unrecognised".
I have ftools also running on a Stata 13 version and there I have no issues.
All the best,
Glenn

Add check in panelsum() when factors are sorted

This would speed up F.sort() calls when factors are sorted in the dataset; particularly useful if we run this method a lot (e.g. reghdfe)

First, create .is_sorted

Then, intercept this loop and replace (not tested):

p[index[level] = index[level] + 1] = obs

with

p[idx = index[level] = index[level] + 1] = obs
if (is_sorted) {
    if (idx < last_idx)) is_sorted = 0 // set is_sorted = 1 before the loop
    last_idx = idx // initially set last_idx = 0
}

Also benchmark it to see if the slowdown is high (in which case we make the sort check optional and unroll the loop)

Finally, sort() and _sort() should add a line like if (is_sorted) return(data)

fmerge fails if there is a space in the path

fmerge id using "/some path with spaces/somefile" will fail due to insufficient quotes (we should quote everything that touches the -filename- local (including instances of -use-)

fcollapse with any missing weights returns all missing

Consider

clear
set obs 10
gen x = _n
gen y = 1
replace y = . if mod(_n, 3) == 0
gen g = mod(_n, 5)

Missing weights should be dropped, but instead all the results are missing

. which fcollapse
/home/mauricio/ado/plus/f/fcollapse.ado
*! version 2.24.1 15jan2018

.     preserve

.         fcollapse x [fw = y], by(g)

.         l

     +-------+
     | g   x |
     |-------|
  1. | 0   . |
  2. | 1   . |
  3. | 2   . |
  4. | 3   . |
  5. | 4   . |
     +-------+

.     restore

.     collapse x [fw = y], by(g)

.     l

     +---------+
     | g     x |
     |---------|
  1. | 0   7.5 |
  2. | 1     1 |
  3. | 2   4.5 |
  4. | 3     8 |
  5. | 4     4 |
     +---------+

fmerge error

I just received the following error when using fmerge. I suspect I hit some sort of memory limit.

fmerge m:1 perm using "`permutations'", assert(match) nogen
                  join():  3900  unable to allocate real <tmp>[403200000,9]
                 <istmt>:     -  function returned error
r(3900);                                                                                                     

Here's version number:

which ftools             
/home/wtownsend/ado/plus/f/ftools.ado
SMP Mon Jun 25 08:07:07 PDT 
*! version 2.24.3 24jan2018 

Here's the dofile, for context
https://pastebin.com/NixsYupi

join: problems with spaces in filepath

Hi!
Join will throw an error if the filepath for your tempfiles contains spaces. This can be fixed by enclosing "form", "into" and "filename" on lines 24 and 65 of join.ado in additional quotes.

Thanks so much for ceating this great set of tools - it already saved me hours of time!

fcollapse fails with string by variables (Stata 14/MP)

. sysuse auto, clear
(1978 Automobile Data)

. version 14

. fcollapse price, by(make)
      st_varvaluelabel():   181  Stata returned error
    Factor::store_keys():     -  function returned error
            f_collapse():     -  function returned error
                 <istmt>:     -  function returned error
r(181);

improve 'ftools compile'

  • ftools xyz is treated as ftools compile instead of raising an error
  • similarly, ftools check stays silent, but ftools check, v (nonexisting option) ends up as ftools compile
  • if the ado/plus/l folder does not exist, we should try to create it before saving to it
  • the abcreg source code has a more robust variant of the ftools.ado code
  • expose ftools check on the help file

join: do not copy certain chars

When copying labels/notes from -using- to -master-, avoid copying _dta[...], in particular _TStvar _TSpanel _TSdelta _TSitrvl tis iis (or maybe no _dta at all?)

Otherwise, running -join- messes up -xtset-

fcollapse incorrectly parses negative weights

clear
set obs 1
gen x = 1
gen y = -1
preserve
    fcollapse (p30) p30 = x (p50) p50 = x (p70) p70 = x [fw = y]
restore, preserve
    fcollapse (p30) p30 = x (p50) p50 = x (p70) p70 = x [pw = y]
restore, preserve
    fcollapse (p30) p30 = x (p50) p50 = x (p70) p70 = x [aw = y]
restore

With collapse, all those instances throw errors. Further,

set seed 42
clear
set obs 10
gen x = _n
gen y = int(10 * rnormal())
l
preserve
    fcollapse (p10) p10 = x (p50) p50 = x (p90) p90 = x [iw = y]
    l
restore, preserve
    collapse (p10) p10 = x (p50) p50 = x (p90) p90 = x [iw = y]
    l
restore

Produces

     +-----------------+
     | p10   p50   p90 |
     |-----------------|
  1. |  10     1     1 |
     +-----------------+

vs

     +-----------------+
     | p10   p50   p90 |
     |-----------------|
  1. |   1     5    10 |

I think you are running into the same issue I did, and are forgetting to normalize the weights. For percentiles, collapse multiplies the weights by number non-missing / sum weights before computing them. This gives the right answer:

 qui sum y, meanonly
 gen ynorm = y / `r(sum)'
 fcollapse (p10) p10 = x (p50) p50 = x (p90) p90 = x [iw = ynorm]
 l

parallel_map crashes on computers with slow temp folder

parallel_map runs into frequent supposed "syntax" errors that crashes the program. Changing the sleep time in line 348 from 50 to e.g. 500 solves this problem. This is important for legacy servers which, while having slow drives for the temporary files folder, still have many cores that are useful to speed up simulations.

I propose adding a [sleep(integer 50)] option in the syntax, with default 50 as is now. Should I do a pull request?

Stata 17, MP4.
Windows Server 2019.

data type following fcollapse

. which ftools
/home/asheph/ado/plus/f/ftools.ado
*! version 2.24.0 11jan2018

Stata's own collapse command will promote the storage type of variables. So, if I have

set obs 1000
gen byte x = 1
collapse (sum) x

then the sum x will be promoted to double with a value of 1000 returned.

With fcollapse the storage type of the returned variable x is not promoted from byte, in which case a missing value is returned.

fmerge error

Hi using stata 13.1 mp and facing the following error when running fmerge

Using stata 13.1mp

  is_integers_only():  3253  pk[325815,1] found where real required

              join():     -  function returned error

             <istmt>:     -  function returned error

error when running fsort twice

Thanks for this excellent set of programs, but when I run the commond fsort twice there was an error like this:
. sysuse auto,clear
(1978 Automobile Data)

. fsort turn

. fsort turn
st_store(): 3200 conformability error
: - function returned error
r(3200);
Why does this happen?

Possible issue with fsort when clearing sort variable

In testing hashsort, I found that fsort sometimes did not give me an identical data set compared to sort, stable. I cannot replicate this from a blank session very easily, so here is the data that gives the issue:

local addr https://raw.githubusercontent.com/mcaceresb/stata-gtools
local path develop/src/github-issues/

use `addr'/`path'/fsort_share.dta
sort int1, stable
tempfile cmp
save `cmp'

use `addr'/`path'/fsort_share.dta
fsort int1
cf * using `cmp'

The result is

. cf * using `cmp'
           rsort:  1 mismatch
r(9);

I believe the issue is with Andrew Maurer's trick to clear : sortedby. I got around this by setting obs to =_N + 1, manipulating the last observation, and dropping it. This way the origina data is never altered.

fmerge: 1:1 merge, error 3498, <id1 id2> do not uniquely identify obs. in the master data

Hello,

A few times I have accidentally performed a 1:1 merge using fmerge when I should have performed a m:1 merge. I then get error 3498. However, rather than failing to merge (as would happen, I think, with the standard merge command) fmerge seems to perform a correct m:1 merge anyway. Am I correct that this happens? I wonder if this is a feature or a bug?

Thanks for your help!

Support many to many join

Would it be possible to add many to many functionality in the join command that could mimic the joinby command?

error with join

The command -join- appears to fail with the error message 3598 when the master dataset is a tempfile. See reproducible example below. I am running Stata 14.0 on mac OS X 10.13.6.

sysuse auto , clear
generate id = _n
preserve
keep id make price mpg
tempfile tmp
save "tmp'" restore keep id foreign join , into("tmp'") by(id)

fcollapse issue with double identifier

Hi Sergio,

First - thanks very much for writing the gtools and ftools packages. Incredibly useful and a great public service!!!

I have found that with fcollapse, the results of the collapse are likely to be incorrect if the identifier is a very long double (in my example, census block identifiers). Another reminder that one should always use strings for identifiers, but I wanted to flag this issue. This issue does not show up using collapse or gcollapse.

Thanks -- Nate

fmerge / join changes using-keys>100 to missing

Hi Sergio,

I found a rather weird bug in the fmerge/join command:

Let's say I have a numeric ID ranging from 1 to X where X>100. In the Master data, the ID is always <100 but in the Using data the ID can be > 100. The fmerge/join works, however, it will change all IDs > 100 to missing in the final data. This behavior does happen in all join into/from combinations.

I attached an example code and data.

Best,

Chris

bugexample.zip

(ftools header: *! version 2.37.0 16aug2019,
join header: *! version 2.36.1 13feb2019,
fmerge header: *! version 2.10.0 3apr2017,
stata version 15 mp,
mac osx
)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.