Code Monkey home page Code Monkey logo

regressiontables.jl's Introduction

dev stable Build Status codecov.io DOI

RegressionTables.jl

This package provides publication-quality regression tables for use with FixedEffectModels.jl, GLM.jl, GLFixedEffectModels.jl and MixedModels.jl, as well as any package that implements the RegressionModel abstraction.

In its objective it is similar to (and heavily inspired by) the Stata command esttab and the R package stargazer.

Table of Contents

Installation

To install the package, type in the Julia command prompt

] add RegressionTables

A brief demonstration

using RegressionTables, DataFrames, FixedEffectModels, RDatasets, GLM

df = dataset("datasets", "iris")

rr1 = reg(df, @formula(SepalLength ~ SepalWidth + fe(Species)))
rr2 = reg(df, @formula(SepalLength ~ SepalWidth + PetalLength + fe(Species)))
rr3 = reg(df, @formula(SepalLength ~ SepalWidth * PetalLength + PetalWidth + fe(Species)))
rr4 = reg(df, @formula(SepalWidth ~ SepalLength + PetalLength + PetalWidth + fe(Species)))
rr5 = glm(@formula(SepalWidth < 2.9 ~ PetalLength + PetalWidth + Species), df, Binomial())

regtable(
    rr1,rr2,rr3,rr4,rr5;
    render = AsciiTable(),
    labels = Dict(
        "versicolor" => "Versicolor",
        "virginica" => "Virginica",
        "PetalLength" => "Petal Length",
    ),
    regression_statistics = [
        Nobs => "Obs.",
        R2,
        R2Within,
        PseudoR2 => "Pseudo-R2",
    ],
    extralines = [
        ["Main Coefficient", "SepalWidth", "SepalWidth", "Petal Length", "Petal Length", "Intercept"],
        DataRow(["Coef Diff", 0.372 => 2:3, 1.235 => 3:4, ""], align="lccr")
    ],
    order = [r"Int", r" & ", r": "]
)

yields

----------------------------------------------------------------------------------------------------
                                          SepalLength                 SepalWidth    SepalWidth < 2.9
                            --------------------------------------   ------------   ----------------
                                   (1)          (2)            (3)            (4)                (5)
----------------------------------------------------------------------------------------------------
(Intercept)                                                                                   -1.917
                                                                                             (1.242)
SepalWidth & Petal Length                                   -0.070
                                                           (0.041)
Species: Versicolor                                                                        10.441***
                                                                                             (1.957)
Species: Virginica                                                                         13.230***
                                                                                             (2.636)
SepalWidth                    0.804***     0.432***       0.719***
                               (0.106)      (0.081)        (0.155)
Petal Length                               0.776***       1.047***        -0.188*             -0.773
                                            (0.064)        (0.143)        (0.083)            (0.554)
PetalWidth                                                  -0.259       0.626***           -3.782**
                                                           (0.154)        (0.123)            (1.256)
SepalLength                                                              0.378***
                                                                          (0.066)
----------------------------------------------------------------------------------------------------
Species Fixed Effects              Yes          Yes            Yes            Yes
----------------------------------------------------------------------------------------------------
Estimator                          OLS          OLS            OLS            OLS           Binomial
----------------------------------------------------------------------------------------------------
Obs.                               150          150            150            150                150
R2                               0.726        0.863          0.870          0.635
Within-R2                        0.281        0.642          0.659          0.391
Pseudo-R2                        0.527        0.811          0.831          0.862              0.347
Main Coefficient            SepalWidth   SepalWidth   Petal Length   Petal Length          Intercept
Coef Diff                            0.372                      1.235
----------------------------------------------------------------------------------------------------

LaTeX output can be generated by using

regtable(rr1,rr2,rr3,rr4; render = LatexTable())

which yields

\begin{tabular}{lrrrr}
\toprule
                                & \multicolumn{3}{c}{SepalLength} & \multicolumn{1}{c}{SepalWidth} \\ 
\cmidrule(lr){2-4} \cmidrule(lr){5-5} 
                                &      (1) &      (2) &       (3) &                            (4) \\ 
\midrule
SepalWidth                      & 0.804*** & 0.432*** &  0.719*** &                                \\ 
                                &  (0.106) &  (0.081) &   (0.155) &                                \\ 
PetalLength                     &          & 0.776*** &  1.047*** &                        -0.188* \\ 
                                &          &  (0.064) &   (0.143) &                        (0.083) \\ 
PetalWidth                      &          &          &    -0.259 &                       0.626*** \\ 
                                &          &          &   (0.154) &                        (0.123) \\ 
SepalWidth $\times$ PetalLength &          &          &    -0.070 &                                \\ 
                                &          &          &   (0.041) &                                \\ 
SepalLength                     &          &          &           &                       0.378*** \\ 
                                &          &          &           &                        (0.066) \\ 
\midrule
SpeciesDummy Fixed Effects      &      Yes &      Yes &       Yes &                            Yes \\ 
\midrule
$N$                             &      150 &      150 &       150 &                            150 \\ 
$R^2$                           &    0.726 &    0.863 &     0.870 &                          0.635 \\ 
Within-$R^2$                    &    0.281 &    0.642 &     0.659 &                          0.391 \\ 
\bottomrule
\end{tabular}

Similarly, HTML tables can be created with HtmlTable().

Send the output to a text file by passing the destination file as a keyword argument:

regtable(rr1,rr2,rr3,rr4; render = LatexTable(), file="myoutputfile.tex")

then use \input in LaTeX to include that file in your code. Be sure to use the booktabs package:

\documentclass{article}
\usepackage{booktabs}

\begin{document}

\begin{table}
\label{tab:mytable}
\input{myoutputfile}
\end{table}

\end{document}

regtable() can also print TableRegressionModel's from GLM.jl (and output from other packages that produce TableRegressionModel's):

using GLM

dobson = DataFrame(Counts = [18.,17,15,20,10,20,25,13,12],
    Outcome = categorical(repeat(["A", "B", "C"], outer = 3)),
    Treatment = categorical(repeat(["a","b", "c"], inner = 3)))
rr1 = fit(LinearModel, @formula(SepalLength ~ SepalWidth), df)
lm1 = fit(LinearModel, @formula(SepalLength ~ SepalWidth), df)
gm1 = fit(GeneralizedLinearModel, @formula(Counts ~ 1 + Outcome + Treatment), dobson,
                  Poisson())

regtable(rr1,lm1,gm1)

yields

---------------------------------------------
                   SepalLength        Counts 
               -------------------   --------
                    (1)        (2)        (3)
---------------------------------------------
(Intercept)    6.526***   6.526***   3.045***
                (0.479)    (0.479)    (0.171)
SepalWidth       -0.223     -0.223           
                (0.155)    (0.155)           
Outcome: B                             -0.454
                                      (0.202)
Outcome: C                             -0.293
                                      (0.193)
Treatment: b                            0.000
                                      (0.200)
Treatment: c                           -0.000
                                      (0.200)
---------------------------------------------
Estimator           OLS        OLS    Poisson
---------------------------------------------
N                   150        150          9
R2                0.014      0.014           
Pseudo R2         0.006      0.006      0.104
---------------------------------------------

Printing of StatsBase.RegressionModels (e.g., MixedModels.jl and GLFixedEffectModels.jl) generally works but are less well tested; please file as issue if you encounter problems printing them.

Function Reference

Arguments

  • rr::FixedEffectModel... are the FixedEffectModels from FixedEffectModels.jl that should be printed. Only required argument.
  • keep is a Vector of regressor names (Strings), integers, ranges or regex that should be shown, in that order. Defaults to an empty vector, in which case all regressors will be shown.
  • drop is a Vector of regressor names (Strings), integers, ranges or regex that should not be shown. Defaults to an empty vector, in which case no regressors will be dropped.
  • order is a Vector of regressor names (Strings), integers, ranges or regex that should be shown in that order. Defaults to an empty vector, in which case the order of regressors will be unchanged. Other regressors are still shown (assuming drop is empty)
  • fixedeffects is a Vector of FE names (Strings), integers, ranges or regex that should be shown, in that order. Defaults to an empty vector, in which case all FE's will be shown.
  • align is a Symbol from the set [:l,:c,:r] indicating the alignment of results columns (default :r right-aligned). Currently works only with ASCII and LaTeX output.
  • header_align is a Symbol from the set [:l,:c,:r] indicating the alignment of the header row (default :c centered). Currently works only with ASCII and LaTeX output.
  • labels is a Dict that contains displayed labels for variables (Strings) and other text in the table. If no label for a variable is found, it default to variable names. See documentation for special values.
  • estimformat is a String that describes the format of the estimate.
  • digits is an Int that describes the precision to be shown in the estimate. Defaults to nothing, which means the default (3) is used (default can be changed by setting RegressionTables.default_digits(render::AbstractRenderType, x) = 3).
  • statisticformat is a String that describes the format of the number below the estimate (se/t).
  • digits_stats is an Int that describes the precision to be shown in the statistics. Defaults to nothing, which means the default (3) is used (default can be changed by setting RegressionTables.default_digits(render::AbstractRenderType, x) = 3).
  • below_statistic is a type that describes a statistic that should be shown below each point estimate. Recognized values are nothing, StdError, TStat, and ConfInt. nothing suppresses the line. Defaults to StdError.
  • regression_statistics is a Vector of types that describe statistics to be shown at the bottom of the table. Built in types are Recognized symbols are Nobs, R2, PseudoR2, R2CoxSnell, R2Nagelkerke, R2Deviance, AdjR2, AdjPseudoR2, AdjR2Deviance, DOF, LogLikelihood, AIC, AICC, BIC, FStat, FStatPValue, FStatIV, FStatIVPValue, R2Within. Defaults vary based on regression inputs (simple linear model is [Nobs, R2]).
  • extralines is a Vector or a Vector{<:AbsractVector} that will be added to the end of the table. A single vector will be its own row, a vector of vectors will each be a row. Defaults to nothing.
  • number_regressions is a Bool that governs whether regressions should be numbered. Defaults to true.
  • groups is a Vector, Vector{<:AbstractVector} or Matrix of labels used to group regressions. This can be useful if results are shown for different data sets or sample restrictions.
  • print_fe_section is a Bool that governs whether a section on fixed effects should be shown. Defaults to true.
  • print_estimator_section is a Bool that governs whether to print a section on which estimator (OLS/IV/Binomial/Poisson...) is used. Defaults to true if more than one value is displayed.
  • standardize_coef is a Bool that governs whether the table should show standardized coefficients. Note that this only works with TableRegressionModels, and that only coefficient estimates and the below_statistic are being standardized (i.e. the R^2 etc still pertain to the non-standardized regression).
  • render::AbstractRenderType is a AbstractRenderType type that governs how the table should be rendered. Standard supported types are ASCII (via AsciiTable()) and LaTeX (via LatexTable()). Defaults to AsciiTable().
  • file is a String that governs whether the table should be saved to a file. Defaults to nothing.
  • transform_labels is a Dict or one of the Symbols :ampersand, :underscore, :underscore2space, :latex

Details

A typical use is to pass a number of FixedEffectModels to the function, along with how it should be rendered (with render argument):

regtable(regressionResult1, regressionResult2; render = AsciiTable())

Pass a string to the file argument to create or overwrite a file. For example, using LaTeX output,

regtable(regressionResult1, regressionResult2; render = LatexTable(), file="myoutfile.tex")

Main Changes for v0.6

Version 0.6 was a major rewrite of the backend with the goal of increasing the flexibility and decreasing the dependencies on other packages (regression packages are now extensions). While most code written with v0.5 should continue to run, there might be a few differences and some deprecation warnings. Below is a brief overview of the changes:

New Features

  • There is an extralines argument that can accept vectors with pairs, where the pair defines a multicolumn value (["Label", "two columns" => 2:3, 1.5 => 4:5]), it can also accept a DataRow object that allows for more control.
  • New keep drop and order arguments allow exact names, regex to search within names, integers to select specific values, and ranges (1:4) to select groups, and they can be mixed ([1:2, :end, r"Width"])
  • labels now applies to individual parts of an interaction or categorical coefficient name (hopefully reducing the number of labels required)
  • The interaction symbol now depends on the table type, so in Latex, the interactions will have \$\\times\$
    • Using a Latex table will also automatically escape parts of coefficient names (if no other labels are provided)
  • A confidence interval is now an option for a below statistic (below_statistic=ConfInt)
  • Several defaults are different to try and provide more relevant information (see changes do defaults section)
  • Fixed effect values now have a suffix (defaults to " Fixed Effects") so that labeling can be simpler. Disable by setting print_fe_suffix=false
  • It is now possible to print the coefficient value and "under statistic" on the same line (stat_below=false)
  • It is possible to define custom regression statistics that are calculated based on the regressions provided
  • It is possible to change the order of the major blocks in a regression table
  • Using RegressionTables for descriptive statistics is now easier. Describe a DataFrame (df_described=describe(df)) and provide that to a RegressionTable (tab = RegressionTable(names(df_described), Matrix(df_described))), there are also options to render the table as a LatexTable or HtmlTable. Write this to a file using write(file_name, tab)
  • It is possible to overwrite almost any setting. For example, to make T-Statistics the default in all tables, run RegressionTables.default_below_statistic(render::AbstractRenderType)=TStat
  • Option to show clustering (print_clusters=true).
    • Can also be the size of the clusters by running Base.repr(render::AbstractRenderType, x::RegressionTables.ClusterValue; args...) = repr(render, value(x); args...)
  • Several new regression statistics are now available, the full list is: [Nobs, R2, PseudoR2, R2CoxSnell, R2Nagelkerke, R2Deviance, AdjR2, AdjPseudoR2, AdjR2Deviance, DOF, LogLikelihood, AIC, AICC, BIC, FStat, FStatPValue, FStatIV, FStatIVPValue, R2Within]
  • Use LatexTableStar to create a table that expands the entire text width

Changes to Defaults

There are some changes to the defaults from version 0.5 and two additional settings

  • Interactions in coefficients now vary based on the type of table. In Latex, this now defaults to $\\times$ and in HTML &times;. These can be changed by running:
    • RegressionTables.interaction_combine(render::AbstractRenderType) = " & "
    • RegressionTables.interaction_combine(render::AbstractLatex) = " & "
    • RegressionTables.interaction_combine(render::AbstractHtml) = " & "
  • print_estimator default was true, now it is true if more than one type of regression is provided (i.e., "IV" and "OLS" will display the estimator, all "OLS" will not). Set to the old default by running:
    • RegressionTables.default_print_estimator(x::AbstractRenderType, rrs) = true
  • number_regressions default was true, now it is true if more than one regression is provided. Set to the old default by running:
    • RegressionTables.default_number_regressions(x::AbstractRenderType, rrs) = true
  • regression_statistics default was [Nobs, R2], these will vary based on provided regressions. For example, a fixed effect regression will default to [Nobs, R2, R2Within] and a Probit regression will default to [Nobs, PseudoR2] (and if multiple types, these will be combined). Set to the old default by running:
    • RegressionTables.default_regression_statistics(x::AbstractRenderType, rrs::Tuple) = [Nobs, R2]
  • Labels for the type of the regression are more varied for non-linear cases, instead of "NL", it will display "Poisson", "Probit", etc. These can be changed by running:
    • RegressionTables.label_distribution(x::AbstractRenderType, d::Probit) = "NL"
  • print_fe_suffix is a new setting where " Fixed Effect" is added after the fixed effect. Turn this off for all tables by running:
    • RegressionTables.default_print_fe_suffix(x::AbstractRenderType) = false
  • print_control_indicator is a new setting where a line is added if any coefficients are omitted. Turn this off for all tables by running:
    • RegressionTables.default_print_control_indicator(x::AbstractRenderType) = false

Changes to Labeling

Labels for most display elements around the table are no longer handled by the labels dictionary but by functions. The goal is to allow a "set and forget" mentality, where changing the label once permanently changes it for all tables. For example, instead of:

labels=Dict(
  "__LABEL_ESTIMATOR__" => "Estimator",
  "__LABEL_FE_YES__" => "Yes",
  "__LABEL_FE_NO__" => "",
  "__LABEL_ESTIMATOR_OLS" => "OLS",
  "__LABEL_ESTIMATOR_IV" => "IV",
  "__LABEL_ESTIMATOR_NL" => "NL"
)

Run

RegressionTables.label(render::AbstractRenderType, ::Type{RegressionType}) = "Estimator"
RegressionTables.fe_value(render::AbstractRenderType, v) = v ? "Yes" : ""
RegressionTables.label_ols(render::AbstractRenderType) = "OLS"
RegressionTables.label_iv(render::AbstractRenderType) = "IV"
RegressionTables.label_distribution(render::AbstractRenderType, d::Probit) = "Probit"# non-linear values now
# display distribution instead of "NL"

See the documentation for more examples. For regression statistics, it is possible to pass a pair (e.g., [Nobs => "Obs.", R2 => "R Squared"]) to relabel those.

Labels for coefficient names are the same, but interaction and categorical terms might see some differences. Now, each part of an interaction or categorical term can be labeled independently (so labels=Dict("coef1" => "Coef 1", "coef2" => "Coef 2") would relabel coef1 & coef2 to Coef 1 & Coef 2). This might cause changes to tables if the labels dictionary contains an interaction label but not both pieces independently, the display would depend on which order the dictionary is applied (so labels=Dict("coef1" => "Coef 1", "coef1 & coef2" => "Coef 1 & Coef 2") might turn the interaction into either Coef 1 & Coef 2 or Coef 1 & coef2).

custom_statistics replaced by extralines

The custom_statistics argument took a NamedTuple with vectors, this is now simplified in the extralines argument to a Vector, where the first argument is what is displayed in the left most column. extralines now accepts a Pair of val => cols (e.g., 0.153 => 2:3), where the second value creates a multicolumn display. See the examples in the documentation under "Extralines".

For statistics that can use the values in the regression model (e.g., the mean of Y), it is possible to create those under an AbstractRegressionStatistic. See the documentation for an example.

print_result and out_buffer arguments are gone

print_result is no longer necessary since an object is returned by the regtable function (which is editable) and displays well in notebooks like Pluto or Jupyter. Similarly for out_buffer, use tab=regtable(...); print(io, tab).

Other Deprecation Warnings that should not change results

  • renderSettings is deprecated, use render and file
  • regressors is deprecated, use keep drop and order

regressiontables.jl's People

Contributors

felixholub avatar floswald avatar github-actions[bot] avatar grahamstark avatar greimel avatar jmboehm avatar juliatagbot avatar junder873 avatar ken-b avatar ki-chi avatar matthieugomez avatar scottpjones avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

regressiontables.jl's Issues

Print `estimate` and statistic in one line

Is it possible to print what is now the below_statistic in the same line as the estimate? It looks like the decorator only has access to the p-value which I would rather not report. However, especially on slides it is useful to have a "shorter" table. Stargazer has the single.row option that does that (see p. 5 of Vignette).

Update for new FixedEffectModels

The 0.9.2 version of FixedEffectModels breaks this example:

using DataFrames, RDatasets, FixedEffectModels
df = dataset("plm", "Cigar")
regtable(reg(df, @formula(Sales ~ NDI + fe(State) + fe(Year)), Vcov.cluster(:State), weights = :Pop))

ERROR: type FixedEffectModel has no field feformula
Stacktrace:
 [1] getproperty(::Any, ::Symbol) at .\sysimg.jl:18
 [2] #regtable#22(::Array{String,1}, ::Array{String,1}, ::Dict{String,String}, ::String, ::typeof(RegressionTables.default_ascii_estim_decoration), ::String, ::Symbol, ::getfield(RegressionTables, Symbol("##25#37")), ::Array{Symbol,1}, ::Bool, ::getfield(RegressionTables, Symbol("##26#38")), ::Bool, ::Bool, ::Bool, ::Base.GenericIOBuffer{Array{UInt8,1}}, ::typeof(identity), ::RenderSettings, ::typeof(regtable), ::FixedEffectModel) at C:\Users\Max\.julia\packages\RegressionTables\rDg5b\src\regtable.jl:220
 [3] regtable(::FixedEffectModel) at C:\Users\Max\.julia\packages\RegressionTables\rDg5b\src\regtable.jl:93
 [4] top-level scope at none:0```

Escape headers

If the column headers contain underscores or ampersands, they're rendered in latexOutput() without any escaping, which is invalid Latex syntax. They can be escaped with with backslash.

Issue with FixedEffectModels

using DataFrames, RDatasets, FixedEffectModels, RegressionTables
df = dataset("plm", "Cigar")
regtable(reg(df, @formula(Sales ~ Pop + NDI&fe(State) + fe(Year)), Vcov.cluster(:State), weights = :Pop)) 

yields:

ERROR: MethodError: no method matching name(::Term)
Closest candidates are:
  name(::InteractionTerm) at C:\Users\Max\.julia\packages\RegressionTables\dIzyV\src\util\util.jl:14
  name(::FunctionTerm) at C:\Users\Max\.julia\packages\RegressionTables\dIzyV\src\util\util.jl:21
Stacktrace:
 [1] name(::InteractionTerm{Tuple{Term,FunctionTerm{typeof(fe),var"#26#28",(:State,)}}}) at C:\Users\Max\.julia\packages\RegressionTables\dIzyV\src\util\util.jl:14
 [2] #regtable#22(::Array{String,1}, ::Array{String,1}, ::Dict{String,String}, ::String, ::typeof(RegressionTables.default_ascii_estim_decoration), ::String, ::Symbol, ::RegressionTables.var"#25#37", ::Array{Symbol,1}, ::Bool, ::RegressionTables.var"#26#38", ::Bool, ::Bool, ::Bool, ::Base.GenericIOBuffer{Array{UInt8,1}}, ::typeof(identity), ::RenderSettings, ::typeof(regtable), ::FixedEffectModel) at C:\Users\Max\.julia\packages\RegressionTables\dIzyV\src\regtable.jl:218
 [3] regtable(::FixedEffectModel) at C:\Users\Max\.julia\packages\RegressionTables\dIzyV\src\regtable.jl:93
 [4] top-level scope at REPL[9]:1

The desired table would be the one that hides all the NDI coefficients and only shows Pop.

RegressionTables in Pluto.jl notebooks

Apologies if this has been discussed before, it seems very basic but I couldn't find anything in the closed issues:

Would it be possible to return the output from a call to regtable? Currently it's just written to console or file, but when e.g. using the package in a Pluto notebook where the REPL is separate from the notebook it would be nice to have the ability to get the ASCII/HTML output and work with it in the notebook.

Improvements for grouping

In #61, I implemented grouping of regressions.

However, I am not yet satisfied with the following behavior


------------------------------------------------------------------------
                         grp1               looooooooooooooooogong grp2 
               ------------------------   ------------------------------
               SepalLength   SepalWidth   SepalLength      SepalWidth   
               -----------   ----------   -----------   ----------------
                       (1)          (2)           (3)                (4)
------------------------------------------------------------------------
(Intercept)       6.526***                                              
                   (0.479)                                              
SepalWidth          -0.223                   0.432***                   
                   (0.155)                    (0.081)                   
SepalLength                      -0.313                         0.378***
                                (0.239)                          (0.066)
PetalLength                     1.048**      0.776***            -0.188*
                                (0.362)       (0.064)            (0.083)
PetalWidth                                                      0.626***
                                                                 (0.123)
------------------------------------------------------------------------
SpeciesDummy                        Yes           Yes                Yes
------------------------------------------------------------------------
Estimator              OLS           IV           OLS                OLS
------------------------------------------------------------------------
N                      150          150           150                150
R2                   0.014        0.080         0.863              0.635
------------------------------------------------------------------------

(columns 3 and 4 should have the same widths)


------------------------------------------------------------
                        grp1                    grp2        
               ---------------------   ---------------------
               SepalWidth       SepalLength       SepalWidth
               ----------   -------------------   ----------
                      (1)        (2)        (3)          (4)
------------------------------------------------------------
SepalLength        -0.313                           0.378***
                  (0.239)                            (0.066)
PetalLength       1.048**              0.776***      -0.188*
                  (0.362)               (0.064)      (0.083)
(Intercept)                 6.526***                        
                             (0.479)                        
SepalWidth                    -0.223   0.432***             
                             (0.155)    (0.081)             
PetalWidth                                          0.626***
                                                     (0.123)
------------------------------------------------------------
SpeciesDummy          Yes                   Yes          Yes
------------------------------------------------------------
Estimator              IV        OLS        OLS          OLS
------------------------------------------------------------
N                     150        150        150          150
R2                  0.080      0.014      0.863        0.635
------------------------------------------------------------

(the SepalLength label should be split and show up in each group once)

The current behaviour is tested in ftest8.txt and ftest9.txt. The tests should be adjusted once this issue is fixed.

`labels` fails when printing categorical levels?

Not sure I'm missing anything here, but it seems to me the replacement of column names with labels fails when the printed coefficients are labels of categoricals:

julia> using DataFrames, FixedEffectModels, RegressionTables

julia> df = DataFrame(mycol1 = rand(["a", "b", "c"], 10), mycol2 = rand(10), y = rand(10));

julia> labeldict = Dict("mycol1" => "Column 1", "mycol2" => "Column 2", "y" => "depvar");

julia> regtable(reg(df, @formula(y ~ mycol1 + mycol2)), labels = labeldict)

---------------------
               depvar
              -------
                  (1)
---------------------
(Intercept)     0.186
              (0.175)
mycol1: b      -0.117
              (0.166)
mycol1: c      -0.058
              (0.186)
Column 2       0.568*
              (0.227)
---------------------
Estimator         OLS
---------------------
N                  10
R2              0.543
---------------------

(jl_Avs96t) pkg> st
      Status `C:\Users\ngudat\AppData\Local\Temp\jl_Avs96t\Project.toml`
  [9d5cd8c9] FixedEffectModels v1.6.1
  [d519eb52] RegressionTables v0.5.1

Defining additional controls to be displayed with fixed effects

I'm not sure if this is already possible:
Assume some of your explanatory variables consist of a group of variables. For example: age, sex, income are all information on demographics. I do not want to show each of the coefficients. This is achieved by not including them into regressors. However, I would like that in the lower part of the table I can have an entry together with the fixed effects, saying Demograhics Yes.
Is there a way to achieve that in RegressionTables?

Document `make_estim_decorator()`

Thanks a lot for this useful package!

I cannot figure out two (newbie) questions related to estim_decoration from the readme and closed issues:

  • How can I disable this function so that no stars are printed? I tried estim_decoration = false but this didn't work for me.

  • How can I adjust the significance rules (so that e.g. * p<0.1)? I checked the source code and I am wondering if there is a direct way to pass the adjusted rules.

Many thanks for helping with this.

Error when using RegressionTables

Using [d519eb52] RegressionTables v0.2.1, I get the following error message on Julia-1.1.0

julia> using RegressionTables
[ Info: Precompiling RegressionTables [d519eb52-b820-54da-95a6-98e1306fdade]
WARNING: could not import StatsModels.DataFrameRegressionModel into RegressionTables
ERROR: LoadError: LoadError: UndefVarError: DataFrameRegressionModel not defined
Stacktrace:
 [1] top-level scope at none:0
 [2] include at ./boot.jl:326 [inlined]
 [3] include_relative(::Module, ::String) at ./loading.jl:1038
 [4] include at ./sysimg.jl:29 [inlined]
 [5] include(::String) at /home/holub/.julia/packages/RegressionTables/HPSo8/src/RegressionTables.jl:3
 [6] top-level scope at none:0
 [7] include at ./boot.jl:326 [inlined]
 [8] include_relative(::Module, ::String) at ./loading.jl:1038
 [9] include(::Module, ::String) at ./sysimg.jl:29
 [10] top-level scope at none:2
 [11] eval at ./boot.jl:328 [inlined]
 [12] eval(::Expr) at ./client.jl:404
 [13] top-level scope at ./none:3
in expression starting at /home/holub/.julia/packages/RegressionTables/HPSo8/src/util/util.jl:7
in expression starting at /home/holub/.julia/packages/RegressionTables/HPSo8/src/RegressionTables.jl:67
ERROR: Failed to precompile RegressionTables [d519eb52-b820-54da-95a6-98e1306fdade] to /home/holub/.julia/compiled/v1.1/RegressionTables/cYvie.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1197
 [3] _require(::Base.PkgId) at ./loading.jl:960
 [4] require(::Base.PkgId) at ./loading.jl:858
 [5] require(::Module, ::Symbol) at ./loading.jl:853

I guess this is the same issue as #24 .

latexOutput with interaction terms

Example output, where one of the model definitions included an interaction term (model_NMethod = fit(LinearModel, @formula(ADStat ~ Method + PropPoint + log2N + log2N * Method), df)):

\begin{tabular}{lrr}
\toprule
                  & \multicolumn{2}{c}{ADStat} \\ 
\cmidrule(lr){2-3} 
                  &       (1) &            (2) \\ 
\midrule
(Intercept)       & 11.705*** &      12.901*** \\ 
                  &   (0.014) &        (0.035) \\ 
Method: 2         &  0.308*** &      -0.867*** \\ 
                  &   (0.009) &        (0.049) \\ 
Method: 3         &  0.477*** &      -1.293*** \\ 
                  &   (0.009) &        (0.049) \\ 
Method: 4         &  0.576*** &      -1.504*** \\ 
                  &   (0.009) &        (0.049) \\ 
Method: 5         &  0.647*** &      -1.569*** \\ 
                  &   (0.009) &        (0.049) \\ 
Method: 6         &  0.736*** &      -1.646*** \\ 
                  &   (0.009) &        (0.049) \\ 
Method: 7         & -1.788*** &      -0.619*** \\ 
                  &   (0.009) &        (0.049) \\ 
Method: 8         & -2.254*** &      -3.370*** \\ 
                  &   (0.009) &        (0.049) \\ 
PropPoint: 7      & -0.089*** &      -0.089*** \\ 
                  &   (0.007) &        (0.007) \\ 
PropPoint: 8      & -0.160*** &      -0.160*** \\ 
                  &   (0.007) &        (0.007) \\ 
PropPoint: 9      & -0.320*** &      -0.320*** \\ 
                  &   (0.007) &        (0.007) \\ 
PropPoint: 10     & -0.551*** &      -0.551*** \\ 
                  &   (0.007) &        (0.007) \\ 
log2N             & -1.895*** &      -2.023*** \\ 
                  &   (0.001) &        (0.004) \\ 
Method: 2 & log2N &           &       0.126*** \\ 
                  &           &        (0.005) \\ 
Method: 3 & log2N &           &       0.190*** \\ 
                  &           &        (0.005) \\ 
Method: 4 & log2N &           &       0.224*** \\ 
                  &           &        (0.005) \\ 
Method: 5 & log2N &           &       0.238*** \\ 
                  &           &        (0.005) \\ 
Method: 6 & log2N &           &       0.256*** \\ 
                  &           &        (0.005) \\ 
Method: 7 & log2N &           &      -0.126*** \\ 
                  &           &        (0.005) \\ 
Method: 8 & log2N &           &       0.120*** \\ 
                  &           &        (0.005) \\ 
\midrule
Estimator         &       OLS &            OLS \\ 
\midrule
$N$               & 3,490,440 &      3,490,440 \\ 
$R^2$             &     0.409 &          0.410 \\ 
\bottomrule
\end{tabular}

This causes an error, because of the & in each of the Method: 2 & log2N. These should be escaped, ie Method: 2 \& log2N. Without the escape, it is treated as an additional column.

Cannot handle FixedEffectTerm programatically created with FixedEffectModels

FixedEffectModels allows creating formulas programmatically. But RegressionTables cannot handle programmatically created functions which contain fixed effects.

using DataFrames, RegressionTables, FixedEffectModels
df = DataFrame(y=rand(4), x=rand(4), cat = [1,1,2,2])

In what follows, rr1 and rr2 are equivalent, as well as rr3 and rr4.

rr1 = reg(df, @formula(y~x))
rr2 = reg(df, Term(:y)~Term(:x))
rr3 = reg(df, @formula(y~x+fe(cat)))
rr4 = reg(df, Term(:y)~Term(:x)+fe(Term(:cat)))

Applying regtable to rr4 throws an error:

MethodError: no method matching name(::FixedEffectModels.FixedEffectTerm)
Closest candidates are:
  name(!Matched::InteractionTerm) at /home/felix/.julia/packages/RegressionTables/iLitA/src/util/util.jl:14
  name(!Matched::FunctionTerm) at /home/felix/.julia/packages/RegressionTables/iLitA/src/util/util.jl:22
#regtable#22(::Array{String,1}, ::Array{String,1}, ::Dict{String,String}, ::String, ::typeof(RegressionTables.default_ascii_estim_decoration), ::String, ::Symbol, ::RegressionTables.var"#25#37", ::Array{Symbol,1}, ::Bool, ::RegressionTables.var"#26#38", ::Bool, ::Bool, ::Bool, ::Base.GenericIOBuffer{Array{UInt8,1}}, ::typeof(identity), ::RenderSettings, ::typeof(regtable), ::FixedEffectModel) at regtable.jl:218
regtable(::FixedEffectModel) at regtable.jl:93
top-level scope at untitled-235f9d61d03d4233bbbdcc63e4cb078a:9

Update to FixedEffectModels 0.8.2

The type AbstractRegressionResult is not longer defined in FixedEffectModel 0.8.2.
reg now returns a FixedEffectModel that inherits from RegressionModel.

I think the main thing you need to do is to update the code here, using the newly exported functions has_iv and has_fe:

isFERegressionResult(r::FixedEffectModel) = has_fe(r)
isIVRegressionResult(r::FixedEffectModel) = has_iv(r)
isOLSRegressionResult(r::FixedEffectModel) = !has_iv(r)

Roadmap for Econometrics.jl

I wanted to get some feedback and thoughts on how best port the information I give for the estimators in Econometrics.jl (see examples). These models cover the ones supported by FixedEffectsModels (e.g., instrumental variables, absorption of features, robust variance covariance estimators) as well as some of the nonlinear models in GLM.jl (e.g., discrete choice models). Would it be best to work on implementing the API?

Regression tables throws a no fieldfeformula error

I am using Julia 1.2.0 on Windows 10 and have updated all my packages. I attempted to replicate your example in the read me.

using RegressionTables, DataFrames, FixedEffectModels, RDatasets

df = dataset("datasets", "iris")
df[:SpeciesDummy] = categorical(df[:Species])

rr1 = reg(df, @formula(SepalLength ~ SepalWidth + fe(SpeciesDummy)))
rr2 = reg(df, @formula(SepalLength ~ SepalWidth + PetalLength + fe(SpeciesDummy)))
rr3 = reg(df, @formula(SepalLength ~ SepalWidth + PetalLength + PetalWidth + fe(SpeciesDummy)))
rr4 = reg(df, @formula(SepalWidth ~ SepalLength + PetalLength + PetalWidth + fe(SpeciesDummy)))

regtable(rr1,rr2,rr3,rr4; renderSettings = asciiOutput())

I get the following error:

ERROR: type FixedEffectModel has no field feformula
Stacktrace:
 [1] getproperty(::Any, ::Symbol) at .\Base.jl:20
 [2] #regtable#22(::Array{String,1}, ::Array{String,1}, ::Dict{String,String}, ::String, ::typeof(RegressionTables.default_ascii_estim_decoration), ::String, ::Symbol, ::getfield(RegressionTables, Symbol("##27#39")), ::Array{Symbol,1}, ::Bool, ::getfield(RegressionTables, Symbol("##28#40")), ::Bool, ::Bool, ::Bool, ::Base.GenericIOBuffer{Array{UInt8,1}}, ::typeof(identity), ::RenderSettings, ::typeof(regtable), ::FixedEffectModel, ::Vararg{FixedEffectModel,N} where N) at C:\Users\David\.julia\packages\RegressionTables\rDg5b\src\regtable.jl:220
 [3] (::getfield(RegressionTables, Symbol("#kw##regtable")))(::NamedTuple{(:renderSettings,),Tuple{RenderSettings}}, ::typeof(regtable), ::FixedEffectModel, ::Vararg{FixedEffectModel,N} where N) at .\none:0
 [4] top-level scope at none:0

Thanks.

Output to string variable

Is there a way to output the generated LaTeX/HTML code to a string?
I want to call
display("text/html", str)
in my Jupyter notebook.

Right now, I would save to a file and read it in again, but that is inconvenient.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

adjr2 not working for LinearModel

It seems that :adjr2 and :dof_residual do not work for GLM.LinearModel (blank output) My guess is that the isdefined( blah , :adjr2) is returning false. Perhaps need more functions in the same spirit as ther2 to handle these statistics as well? I can give that a shot at some point if you think that is the proper solution

Update to Distributions 0.25

This is related to Pull Request #92, When I downloaded this package and manually run the tests, it seems to work fine. The only exception to this is that GLFixedEffectModels.jl requires Distributions 0.24. Does that prevent this package from being able to use 0.25?

RegressionTables is not working on v1.0.4

Hi,

I tried to load RegressionTables in Julia v1.0.4, but the Precompiling incurs an Error. The problem cannot be resolved with either reinstall or restart Julia.

StatsModels.DataFrameRegressionModel should be StatsModels.TableRegressionModel

This doesn't exist anymore. It's now called TableRegressionModel. See, e.g., what happens when you run the first example here:

julia> data = DataFrame(X=[1,2,3], Y=[2,4,7])
3×2 DataFrame
│ Row │ X     │ Y     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 2     │
│ 2   │ 2     │ 4     │
│ 3   │ 3     │ 7     │

julia> ols = lm(@formula(Y ~ X), data)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

Note: This issue currently causes precompilation to fail.

Allow for custon `vcov` matrices

Currently, RegressionTables.jl doesn't support custom covariance matrices. This is a pain, because GLM doesn't allow the user to cluster errors or your heteroskedastic errors in the glm command itself. People have to use CovarianceMatrices.jl to do this

It would be nice if the regtable command supported adding vcov as a keyword argument to make this work.

if out_buffer is supplied, should not also println to stdout

There is an implicit assumption in the way regtable writes output. In my case, this is causing all of my tables (and additional computations) being printed to the screen twice.

If I supply an out_buffer, you write to the buffer AND you write it to stdout. The problem is that I am eventually writing the buffer that I supplied to stdout. I think if you offer the out_buffer option, the output should only go to this IOBuffer.

API to adhere to?

Is there a public API that if we implement (e.g. coeffs/stderr/rsq/lklhd/bic/aic/etc. functions), we can send our models to your function and have everything work?

I suspect it is ideal for you and users if you don't go down the path of stargazer and try to implement extraction code for every model type out there. Rather, I suspect it is better if you declare a minimum/suggested/full set of functions that if they work with the model object, your code can still produce nice tables.

I write a fair amount of custom estimators, and one of the frustrating things is that, after having put in all that work to get the estimator to work, that there isn't a standard way to output to a nice table. This kind of solution would ostensibly mean I just have to implement a handful of functions for my output object and be good to go.

In any case, thanks for the package!

Installation fails

The package seems great! I can't install it though. I get the following message:

ERROR: RegressionTables's requirements can't be satisfied because of the following fixed packages: DataFrames
resolve(::Dict{String,Base.Pkg.Types.VersionSet}, ::Dict{String,Dict{VersionNumber,Base.Pkg.Types.Available}}, ::Dict{String,Tuple{VersionNumber,Bool}}, ::Dict{String,Base.Pkg.Types.Fixed}, ::Dict{String,VersionNumber}, ::Set{String}) at .\pkg\entry.jl:490
resolve(::Dict{String,Base.Pkg.Types.VersionSet}, ::Dict{String,Dict{VersionNumber,Base.Pkg.Types.Available}}, ::Dict{String,Tuple{VersionNumber,Bool}}, ::Dict{String,Base.Pkg.Types.Fixed}) at .\pkg\entry.jl:479
edit(::Function, ::String, ::Base.Pkg.Types.VersionSet, ::Vararg{Base.Pkg.Types.VersionSet,N} where N) at .\pkg\entry.jl:30
(::Base.Pkg.Entry.##1#3{String,Base.Pkg.Types.VersionSet})() at .\task.jl:335
Stacktrace:
 [1] sync_end() at .\task.jl:287
 [2] macro expansion at .\task.jl:303 [inlined]
 [3] add(::String, ::Base.Pkg.Types.VersionSet) at .\pkg\entry.jl:51
 [4] (::Base.Pkg.Dir.##3#6{Array{Any,1},Base.Pkg.Entry.#add,Tuple{String}})() at .\pkg\dir.jl:33
 [5] cd(::Base.Pkg.Dir.##3#6{Array{Any,1},Base.Pkg.Entry.#add,Tuple{String}}, ::String) at .\file.jl:59
 [6] withenv(::Base.Pkg.Dir.##2#5{Array{Any,1},Base.Pkg.Entry.#add,Tuple{String},String}, ::Pair{String,String}, ::Vararg{Pair{String,String},N} where N) at .\env.jl:157
 [7] #cd#1(::Array{Any,1}, ::Function, ::Function, ::String, ::Vararg{String,N} where N) at .\pkg\dir.jl:32
 [8] add(::String) at .\pkg\pkg.jl:117

Do you know what the conflict might be?

Enable single line output

Currently, latex tables are printed using two lines per estimated coefficient (est + se in the default setting), which eats up quite some space depending on the formatting of the document. Especially if below_statistic=:blank, it's wasteful. I think it would be nice to have an option to use just a single output line then.

I think it would be particularly great to have something like $\underset{below_statistic}{estim}$ if below_statistic!=:blankand just estim otherwise (with decorations obviously) in single line mode.

Labels are not used

Recently a change was made regarding the type of dictionary accepted as labels, now a Dict{Symbol,String} is required. But any value provided now is just ignored. For example:

using RegressionTables, DataFrames, FixedEffectModels, RDatasets

df = dataset("datasets", "iris")
df[:SpeciesDummy] = categorical(df[:Species])

rr1 = reg(df, @model(SepalLength ~ SepalWidth   , fe = SpeciesDummy))

regtable(rr1; renderSettings = asciiOutput(), 
        labels = Dict(:SpeciesDummy => "My categorical variable"))

yields:

--------------------------
               SepalLength
               -----------
                       (1)
--------------------------
SepalWidth        0.804***
                   (0.106)
--------------------------
SpeciesDummy           Yes
--------------------------
Estimator              OLS
--------------------------
N                      150
R2                   0.726
--------------------------

decimal places

Would it be possible to add a function argument to specify the number of decimal places which are shown for the regression coefficients?

This is my first time using, thanks so much for the great work!

String Type restrictions

Currently the type of many arguments is restricted to be String. It seems unhandy, particularly when used together with LaTeXString. Why not extend the restriction to be AbstractString?

Wrong interacted FE

Not sure whether this is an error or incorrect usage:

using RegressionTables, DataFrames, FixedEffectModels, RDatasets

df = dataset("datasets", "iris")
df[:SpeciesDummy] = categorical(df[:Species])
df[:PetalLengthDummy] = categorical(df[:PetalLength])

rr1 = reg(df, @model(SepalLength ~ SepalWidth   , fe = SpeciesDummy))
rr2 = reg(df, @model(SepalLength ~ SepalWidth   , fe = SpeciesDummy * PetalLengthDummy))

regtable(rr1, rr2; renderSettings = asciiOutput())

yields:

----------------------------------
                   SepalLength    
               -------------------
                    (1)        (2)
----------------------------------
SepalWidth     0.804***   0.442***
                (0.106)    (0.125)
----------------------------------
SpeciesDummy        Yes           
----------------------------------
Estimator           OLS        OLS
----------------------------------
N                   150        150
R2                0.726      0.915
----------------------------------

as if there were no fixed effects in the second regression.

HTML Output Option

I'm an R user (using stargazer) and my workflow is all based around HTML output so that I can convert in to MS Word for my collaborators. Any possibility of this?

Error in readme example

dobson = DataFrame(Counts = [18.,17,15,20,10,20,25,13,12],
    Outcome = pool(repeat(["A", "B", "C"], outer = 3)),
    Treatment = pool(repeat(["a","b", "c"], inner = 3))
    );

Gives: UndefVarError: pool not defined.
The following works:

dobson = DataFrame(Counts = [18.,17,15,20,10,20,25,13,12],
    Outcome = categorical(repeat(["A", "B", "C"], outer = 3)),
    Treatment = categorical(repeat(["a","b", "c"], inner = 3))
    );

Also, for completeness, before the Dobson example you might add using GLM.

Proper documentation

As the set of features increase the README is getting longer and longer.

Wouldn't it be time to set up proper multi-page documentation at some point? Or is there a reason to keep it as it is?

How to adjust number of decimal places outputed by `latexOutput()`

I would like to create a Latex regression table from some GLM and/or FIxedEffectModels regressions.
I have regtable(sm1a, sm1b; renderSettings = latexOutput()). This prints out the regression table with 3 digits after the decimal. I would like to make that 4 digits after the decimal. How can I do this?
Thanks.

Number of clusters argument to regression_statistics

Following FixedEffects/FixedEffectModels.jl#65, we can now extract the number of clusters from a FixedEffectModels.jl object using the "nclusters" argument.

using DataFrames, RDatasets, FixedEffectModels
df = dataset("plm", "Cigar")
mod1 = reg(df, @formula(Sales ~ NDI + fe(State) + fe(Year)), Vcov.cluster(:State))
mod1.nclusters
# (State = 46,)

I've found that some journals are quite insistent about incorporating information about the number of clusters into regression tables. Can we therefore add :nclusters as a valid symbol to regression_statistics?

For consideration: multiway clustering

A potential complication — or, at least, an issue that requires some thought — involves the case of multiway clustering. In this case, the "ncluster" argument will produce a named tuple containing the number of clusters for all cluster variables. For example:

mod2 = reg(df, @formula(Sales ~ NDI + fe(State) + fe(Year)), Vcov.cluster(:State, :Year))
mod2.nclusters
# (State = 46, Year = 30)

Naturally, it's possible to report the number of clusters within both the State and Year variables. However, FixedEffectModels.jl will default to using the the smallest length cluster for the SE DoF adjustment (see: FixedEffects/FixedEffectModels.jl#50). My own feeling then is that RegressionTables need only report (and name) the minimum cluster variable (here: Year = 30).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.