Code Monkey home page Code Monkey logo

aclpub2's People

Contributors

anaistack avatar cimeister avatar crux82 avatar danielhers avatar elenacabrio avatar haroldrubio avatar marcopoli avatar mjpost avatar rswilkens avatar ryancotterell avatar texttheater avatar valeriobasile avatar zhzhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

aclpub2's Issues

escaping of names

How should names be escaped to build, and in which files does this need to be done?
I had previously built my proceedings without escaping names in .yml files; I didn't change the yml file but now have a breaking error as some values are saved as None

  File "IWSLT/aclpub2/aclpub2/generate.py", line 528, in normalize_latex_string
    return text.replace("’", "'").replace("&", "\\&").replace("_", "\\_")
AttributeError: 'NoneType' object has no attribute 'replace'

Should we be using hex codes like institution: "Universit\xE4t Potsdam", and is this needed in all yml files?

single-day workshop

If a workshop lasts one day, the build gives the following on the front page:

image

Is there a way to remedy this behaviour in conference_details.yml?

list names of sponsors in addition to logos

In addition to "The X organizers gratefully acknowledge the support from the following sponsors" it would be nice to list the names of the sponsors (and possibly additional partners the workshop has been put on in cooperation with, in the case of SIGs which are affiliated with ACL and also other organizations).

This could be done on a previous page using the preface tex, but ideally the information would be enterable via yml such that it appears on the sponsors page.

An example from past proceedings is on 2020 SIGDIAL page 2:
image (13)

consistency checks

@mjpost pointed me to this repo as the place for the next-gen ACL publications software. Some feature requests based on experience with the Anthology:

  • Warn about missing abstracts
  • Specify SIGs for workshops
  • For colocated events, ensure consistency of the address field by default. Currently this is a source of confusion (e.g. BlackboxNLP 2021 has "Punta Cana, Dominican Republic" in the metadata and "Online" in the PDF footer, as opposed to the main conference (EMNLP) which has "Online and Punta Cana, Dominican Republic").

Make the paper IDs unique

When putting together the paper.yml of multiple events (e.g., when building the program) it may be a clash between IDs.

We should find a way to avoid such clash (e.g., for NAACL)
Is it something to be used when downloading info or during the compiling?

allow sub-blocks for organizing committee, sponsors

It would be nice to support (one level of) sub-blocks in the yml for the organizing committee and sponsors.
This would allow organizers of multiple shared tasks to be grouped within one header as below on page 5 of the 2021 IWSLT proceedings, or if there is more than one set of sponsorship tiers (for $ and donations, for example) these to be entered under subheaders instead of flattening them.

Desired output:
Screen Shot 2022-03-15 at 4 43 48 PM

Need page ranges in the metadata

BibTeX exports in the Anthology including page numbers as metadata (for example). This is something of an anachronism but we should still support it. So we need this information as fields in papers.yaml.

program on multiple pages may overlap page #s, watermark

If a section in program.yml overflows onto a new page, such as listing multiple papers for a poster session which goes onto a 1-2 continued pages, the last entry per page will overlap with the page numbers as shown below:

image

Division in Syllables

Building proceedings for NLP conferences, we should carefully select which kind of language is used when dividing in syllables :-)

For now, I set in proceedings.tex

\usepackage[english,latin]{babel}

Not really sure if this is the correct decision.

@marcopoli : you should align the template of the booklet.

Add the volume_name in the conference.yml

I must remember to add the field volume_name in the conference.yaml

for workshop the value is 1

for the main conference this value reflect the volume:
long | short | tutorials | ...

I must remember to update the sigdial example and the readme

aclpub2/generate hangs up

Hi,

Following Danilo's advice, I updated the aclpub2 version and changed the conference_details.yml from

name: Workshop on Computational Approaches to Subjectivity, Sentiment \& Social Media Analysis
volume: Proceedings of the Twelfth Workshop
abbreviation: WASSA
start_date: 2022-05-26
end_date: 2022-05-26
isbn:
location: Dublin, Ireland
publisher: Association for Computational Linguistics

to

book_title: "Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA 2022)"
event_name: "The 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis"
cover_subtitle: Proceedings of the Workshop
anthology_venue_id: WASSA
start_date: 2022-05-26
end_date: 2022-05-26
isbn: XXX-X-XXXXXX-XX-X (you should replace this with the real ISBN)
location: Dublin, Ireland
editors:
  - first_name: Jeremy
    last_name: Barnes
  - first_name: Orphée
    last_name: De Clercq
  - first_name: Valentin
    last_name: Barriere
  - first_name: Shabnam
    last_name: Tafreshi
  - first_name: Sawsan
    last_name: Alqahtani
  - first_name: João
    last_name: Sedoc
  - first_name: Roman
    last_name: Klinger
  - first_name: Alexandra
    last_name: Balahur
publisher: Association for Computational Linguistics

However, when I run ./bin/generate ../Wassa --proceedings --overwrite to generate the proceedings, the script hangs up after processing 3 papers. It doesn't give any error that I can find. Any advice?

Trailing / is important

OK, maybe I am the only one, but I just spent quite some time figuring out that the last '/' is not part of the ID of the workshop/conference.

Eg, if you run the scripts with aclweb.org/ACL/2022/Conference/ instead of aclweb.org/ACL/2022/Conference than the openreview-py API returns empty lists.

My proposal would be to add a .strip('/') inside the script.

Add scripts for downloading material from Softconf

@valeriobasile @ElenaCabrio I think it would be very useful to add your scripts to export information from Softconf into the ACLPUB2 format.

Then, I would kindly ask you to extend the guide ad done for exporting information from OpenReview. It would be really helpful for some workshop organizers that are using softconf ;-)

TNX!

Handling Author's name in uppercase

When name/family names in the papers.yml are uppercase, this is propagated in the author's index.

@zhzhang What about processing this info so that only the first letter is uppercase?

@rswilkens Is it correct to apply this processing during your export?

Error NoneType object is not iterable

./bin/generate examples/sigdial --proceedings --handbook

Traceback (most recent call last):
File "/home/bharathi/Desktop/aclpub2/./bin/generate", line 32, in
generate_handbook(args.path, args.overwrite)
File "/home/bharathi/Desktop/aclpub2/aclpub2/generate.py", line 176, in generate_handbook
) = load_configs_handbook(root)
File "/home/bharathi/Desktop/aclpub2/aclpub2/generate.py", line 573, in load_configs_handbook
for workshop in workshops:
TypeError: 'NoneType' object is not iterable

or2papers.py does not extract accepted papers only

I feel a little bit boring, but I have another issue for you, sorry!

or2papers.py currently extracts all submitted papers with both 'accept' and 'reject' status. However, README says that it should extract only accepted papers.

Title format in camera-ready version of ACL2022

Following the instruction, I protect some letters with curly braces when I fill the title box, but after submitting, I notice that the curly braces also show on the title of the openreview page.

image

The instruction says "These curly braces will not appear in the online conference program or in the proceedings. They will only appear in the BibTeX file ..." So I wonder if it is ok that these curly braces also show in the openreview page.

Edge cases

  • Line break for authors in ToC needs to be indented in multiline.
  • Left justify the sponsor logos.
  • Line break for wrapped program committee lines.

Software should not be automatically submitted

There are occasional instances (e.g., acl-org/acl-anthology#1921) where software is submitted to the Anthology against the author's expectations. Submission policy should clarify that submitted software will be delivered for archival, or the final copy form should be updated to explicitly ask authors, and mention the benefits.

integration of previous instructions

For workshop proceedings generation, are we still following the recommendations from previous years that were created for START? For example, for the title page it is recommended to put the name of the conference on top (ACL 2022), title without "workshop of" and so on.

Escaping strings in yml

Including & in program.yml breaks the proceedings generation due to usage of a character that must be escaped in LaTeX.

E.g., consider the following program.yml:

- title: Opening Remarks
  start_time: 2022-07-14 08:30:00
  end_time: 2022-07-14 08:40:00

- title: "Keynote"
  start_time: 2022-07-14 08:40:00
  end_time: 2022-07-14 09:25:00
  subsessions:
    - title: "Keynote Talk"
      start_time: 2022-07-14 08:40:00
      end_time: 2022-07-14 09:25:00
    - title: "Keynote Q&A"
      start_time: 2022-07-14 09:25:00
      end_time: 2022-07-14 09:40:00

This will currently break proceedings generation unless the & is escaped (i.e. "Keynote Q\\&A") despite the original string "Keynote Q&A" being a valid yml string.

Two potential fixes might be:

  • Escape strings that are ingested from yml
  • Document that strings appearing in yml must be escaped for downstream LaTeX

Either reduces the pain associated with hitting an error that does not provide descriptive information on how to fix and subsequent attempts to escape characters using HTML syntax (i.e. &) which is more prevalently used in most StackOverflow answers.

LaTeX escapes in metadata should raise an error

Occasionally users are using LaTeX escape codes in names (e.g., Vuli'\{c}) or in titles (\\). It would be nice if the metadata-reading code would flag these and ask that unicode be used instead.

Problems with pax and proceedings with hundred of papers

@zhzhang @ryancotterell I think I found the issue about the ACL compile error.

It was not a problem only tied to save_size, but also to pool_size.

Here is what I did on my mac.
First I relaxed the umount of files I could open:

ulimit -S -n 2048

I searched for the place to add my parameters:

kpsewhich --all texmf.cnf

It pointed to the two files

/usr/local/texlive/2016/texmf.cnf`
/usr/local/texlive/2016/texmf-dist/web2c/texmf.cnf

The first one needed to be modified. I just added a larger

pool_size = 20000000
save_size = 800000

(maybe too large, but I was not sure about the limit) and it seems to work!

I think that know we should re-add \usepackage{pax} to aclpub2/templates/proceedings.tex and invoke twice pdflatex from aclpub2/generate.py.

Unfortunately, I fear we cannot apply Ryan's suggestion to merge smaller parts of the proceedings, as the reference to one page is "absolute" and it will be lost

Adding list of names/middle_name/family_name in the organizing committee or program committee section

I just received the list of reviewers. But aclpub2 does not work.

@rswilkens : reviewers should be printed grouped by names, without institutions.
Actually aclpub2 has a specific flag name_block you should add for action editors and reviewers

- role: Reviewers
  type: name_block  # By adding the name_block type in the role, names will be output in alphabetized blocks.
  entries:
    - Committee Member Name

@zhzhang : however it does not work... it seems it tries to both generate the compact version and the other one

infinite copy loop if calling aclpub2 on the working directory

The way my directory was structured, acl-pub2 was a submodule, and I was calling the generate command on my working directory.

thus Line 129 got into a problematic loop:
shutil.copytree(input_path, Path(output_dir, "inputs"))
(https://github.com/rycolab/aclpub2/blob/main/aclpub2/generate.py#L129)
trying to copy a path that contains the output_dir into the output_dir.

Basically, the current code breaks when the input path is the same path as the working directory.
i.e.
aclpub2/bin/generate ./ --proceedings —overwrite
breaks
But

cd ..
aclpub2/bin/generate ./SPA-NLP/. --proceedings --overwrite

Is ok.

Please fix or make it clear that you cannot call generate on the working directory.

No sponsors

I added a bit of code to handle the case when the workshop declares no sponsors. However, we also need to remove "The HumEval organizers gratefully acknowledge the support from the following sponsors." on the second page. Do you have any suggestions on how to do this cleanly, @zhzhang?

Adding slides and websites of tutorials

I (as a tutorial co-chair of ACL 2022) am wondering how we can associate tutorial slides from ACL Anthology. There are probably two ways to do this:

  1. Collect tutorial slides from instructors and include them as an attachment. I found one instance in https://aclanthology.org/events/acl-2019/#p19-4

In order to realize this, I may need to specify a file in papers.yml:

  badges:
    - type: Presentation
      url: presentations/1.pdf
  1. Add a link to the website of a tutorial and ask tutorial instructors to upload the slides in the website. I've never seen this instance on ACL Anthology. I guess that the corresponding papers.yml could include something like this:
  badges:
    - type: Web
      url: https://some-url-of-the-tutorial/

I think the option 2 would be convenient because tutorial instructors can update their slides in the last minute or evan after a presentation. So, my questions are,

  • Is it allowed to put type: Web in badges?
  • Can ACL Anthology show a URL with this badge?

Book title too long for the watermark

What should we do if the book title is too long and cannot fit into a single line in the watermark?

On START this would be solved by customising the placement around the following line in proceedings.tex:

\put(105,16){\makebox(0,0){\emph{Proceedings of Deep Learning Inside Out (DeeLIO 2022):\ The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures}, pages 1 - 24}}

Thanks!

The program disappeared

We accidentally removed the possibility to add a program to the proceedings (required in some workshops).
It should be re-added

missing python scripts for OpenReview export

Your README mentions two python scripts for OpenReview exports: or2papers.py and or2program_committee.py. However, I couldn't find them in this repository.

Could you please indicate where they are?

Cleanup or2papers.py

A few things top of mind:

  • Use argparse rather than attempting to read directly from sys.argv for username and password
  • Throw error rather than assume ACL 2022 for connection string
  • Wrap primary execution path in main function

Packaging for Accessibility

Given the mix of Python and non-Python dependencies (java, mactex), it would be difficult to host this on PyPI. Nevertheless, it would be useful to package in such a way that it can be more easily called as a command-line executable in order to improve ease of use. Similarly, it would be useful to create a GitHub action that runs this tool against a repository in order to generate the required output.

Cannot compile proceedings

We need additional information on how to fix a paper that cannot be compiled. Specifically, we are getting an error suggesting that we open the PDF and export it again, but trying this across 4 different PDF programs fails to solve the problem. Neither we nor the authors know how to fix the problem so that their paper can be included in the proceedings. Please let us know how to proceed.

Error message:

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/[email protected]/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Users/tjn/tmp/aclpub2/aclpub2/generate.py", line 399, in create_watermarked_pdf
    raise Exception(
Exception: Sorry but it seems I cannot compile paper 6.pdf and it will not be added to the output folder!
It is generally due to a PDF with a problematic internal links.
A "possible" solution is to open the PDF with any preview system and export it again.
"""

The above exception was the direct cause of the following exception:

Exception: Sorry but it seems I cannot compile paper 6.pdf and it will not be added to the output folder!
It is generally due to a PDF with a problematic internal links.
A "possible" solution is to open the PDF with any preview system and export it again.
None

Sorry. I have problems compiling the watermarked papers. Press Enter to process another paper or Ctrl+C to quit.

Add some automatic checks

It would be useful adding some checks at the start of the generation process.

When loading (especially the manually defined files) the software should raise come warnings about the consistency of those files.

For example, we should check that the conference.yml is consistent with the one agreed with ACL anthology discussed here:
#33
For example, some fields (e.g., location or editors) are not used in the generation of the proceedings but are important to publish them on the ACL anthology.

This would reduce many issues in the end-to-end process

Some modifications in the conference_details.yml for the publication in ACL anthology

@zhzhang: for the publication in the ACL anthology, the conference_details.yml file should contain the following fields:

  • editors: the list of the volume editors (i.e., the Program Chairs in the main conference and the Workshop Organizers in the workshop). It should be a list of entities with name/family name/middle name as the authors of the papers.
  • the address of the conference: it should be a string

@mjpost: is it correct? are we missing anything?

special characters in names

in papers.yml, one author last name contains a special Vietnamese character: Lưu

So the proceedings don't compile with the following error:

! Package inputenc Error: Unicode character ư (U+01B0) (inputenc) not set up for use with LaTeX.

Can author names contain such characters, or should the name be modified? The ACL Anthology allows for diacritics in surnames (like Müller) but I'm not sure about ư.

Related to acl-org/aclpubcheck#25

problem when generating program commitee

When the first author of the entries in a block misses the institution field, the entire block is not displayed (i.e., it is not reported in the latex file).

@rswilkens : can you check with the responsible(s) of the program committee (I imagine the Program Chair) to check this missing information?

@zhzhang : anyway, I suggest that, when the institution is missing, the software should not display:
John Doe,
but
John Doe (without the comma).
There may be a rare case that some institutions are actually missing....

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.