citation-style-language / csl-evolution Goto Github PK

View Code? Open in Web Editor NEW

2.0 11.0 2.0 0 B

Central repository for coordinating CSL development

csl-evolution's Introduction

csl-evolution

Former central repository for coordinating CSL development. Archived after the release of CSL 1.0.2, January 5, 2022.

csl-evolution's People

Contributors

Stargazers

Watchers

csl-evolution's Issues

Add `hereinafter` variable

CSL-M has a hereinafter variable for special shorttitles/abbreviations/short citation forms of particular items. Wouldn't it make sense to include this in CSL? (I don't think that adding the variable poses particular challenges in itself. The main challenge is probably how this variable will be populated by users...)

deprecate `entry` and `article`

Since article and entry seem to have been abandoned in favor or more specific item types entry-dictionary, article-journal, etc. it would seem appropriate to deprecate entry and article. They don't seem to be used by reference managers.

Migrate CSL issues from Zotero-bits and csl/schema to this repo?

I've just went through the issues over at https://github.com/citation-style-language/zotero-bits/. It's a bit problematic that some of theses issues are Zotero issues while other require CSL changes. Is there a way to migrate CSL issues to this repo to make things less confusing. The same applies for the issues at https://github.com/citation-style-language/schema/. It seems to me that a lot of questions are being discussed in different places...

add new creator types

There are some new creator types we should add, primarily for media item types (television broadcasts, films, etc.), along with accompanying terms.

producer
writer (as in a screenwriter)
executive-producer
contributor (e.g., 'Smith, Adam (with John Paul Jones)')
narrator (as for an audiobook)

APA style would also like

host (as for a podcast or other regular broadcast)
guest (for a broadcast)
principal-investigator (for a grant)
co-investigator (for a grant)

There have also been requests before for:

compiler
cartographer
performer

Alternative to many new variables

@bdarcus has rightly expressed annoyance at what are essentially type-specific forms of "author". These make writing styles more complex. An approach to streamline the complexity would be to adopt the names data model described here that includes a label field.

With that model, producer, writer, narrator, host, guest, principal-investigator, co-investigator, cartographer, and performer could all author, with a label with the appropriate term.

This could be implemented in clients by adding the appropriate term label to their CSL JSON (e.g., Zotero could add term="producer" to an author entered as Producer for a TV Broadcast item). A text field could allow users to override the label for more complex descriptions (e.g., "Guest Expert", perhaps "Principal Investigator" and "Co-Investigator").

If we did that, only contributor and executive-producer would need to be added as distinct creator types (even executive-producer could be editor (term="executive-producer"). director could be deprecated and interpreted as author (term="director")

Split main-title and sub-title

Citeproc-js has support for splitting title variables into title-main and title-sub. I think we should formally adopt this in CSL.

There's been some discussion about this here.

I particularly like to highlight @georgd's suggestion to use an inheritable attribute like title-sub-delimiter.

New condition: Test for variable content

In the medium or long term we might want to test for variable content.

<if variable="title" content="asdf">
</if>

Proposed change to <names> behavior

The current behavior if multiple creator variables are passed inside <names> is to render a <name> object for each name type, separated by the delimiter argument passed to <names>. This behavior is limiting, because it makes it impossible to include multiple creator types in one name string. This means that rules for and and et-al cannot be applied to lists containing multiple author types.

For example, in APA, if a book has both an editor and a translator, chapters would be cited as:
… In A. Jones (Ed.), & B. Smith (Trans.), A great book

or, in the case of multiple editors/translators:
… In A. Jones (Ed.), B. Johnson (Ed.), & B. Smith (Trans.), A great book

Another APA example--it also regards authors and illustrators as both "authors" and asks that both be listed together (without labels).

This gets particularly important if we want to fully support media citations in APA. There, APA wants something like the second example, listing each creator with their role as one list (and following and and et-al rules:
Jones, A. (Producer), Johnson, B. (Producer), Allen, C. (Writer), & Smith, B. (Director). (2018). A great TV episode …

These structures aren't currently possible in APA. You can hard-code ", & " as the delimiter on names, but this won't consistently produce a list of (potentially labeled) names with only one &.

A more flexible approach would be for each <names> node to render all of creators given in variable in one string—i.e., instead of variable="editor translator" producing two name strings—one for editors and one for translators, it would produce one name string including all of the creators that are either editors or translators. Collapsing of creators (like for editortranslator) could be done on a name-by-name basis). Besides the additional flexibility, personally, I would find this behavior much more intuitive. (The current behavior can be easily replicated if needed using <group>.)

add text-case="capitalize-subtitle"

APA title formatting rules call for the first letter of words following a period or colon to be in uppercase. citeproc-js has a workaround for this by having an uppercaseSubtitles flag, which Zotero switches on for styles with ID starting with "apa" and a whitelist of other styles. It would be good to incorporate this formally into CSL.

This could be combined with the title-sub-delimiter attribute here to control how to parse subtitles/where to uppercase them.

Change description of `annote` in the specification

I think we ought to update the spec to define annote as annotations on the bibliography entry. note is for things like user notes about the item. As currently defined, note and annote are redundant, and clients already use note in a way that more aligns with the current annote definition.

Originally posted by @bwiernik in #16 (comment)

So, other opinions on this. Shall we change the description of annote?

add new terms for APA/MLA

APA has a variety of standard terms that should be added to enable better use of apa.csl across locales. MLA has a similar set of terms, but I'm less familiar with that style.

In addition to terms for new types of creators and a general term for each item type (e.g., patent, map, song [which might be localized in English as "Audio recording"]), I think the following should be added:

album
advance-online-publication
- This is distinct from in-press or forthcoming
personal-communication
article
- for "Article No. e012345"
preprint
working-paper
original-work-published
on
- for a track on an album (rather than in an album)
review-of
film
video
television-series
radio-broadcast
podcast
special-issue
special-section

Perhaps also

number
- currently term="issue" (localized in English as "No.") is used, which works, so perhaps this one is unnecessary

Adding these would go a long way toward making APA transportable across locales, and it could even eliminate some requirement of users to manually enter genre into their data (e.g., a client could automatically add term:podcast to a Podcast boadcast items). (The term:term behavior is a new propsoal`.)

Institutional author abbreviation or short form

The journal Water Alternatives (http://www.water-alternatives.org/index.php/guide) has in guide for authors:

Example of organisation as author or government publications:
IIMI (International Irrigation Management Institute). 1993. Advancements in IIMI's research 1992. Colombo, Sri Lanka: IIMI.
[in the text, refer to IIMI (1993)]

Name of an author is "Short-form/abbreviation (full-name)" but reference must be "short-form/abbreviation".
https://forums.zotero.org/discussion/57035/institutional-author-abbreviation-or-short-form

Rename `annote` to `annotation`

I think "annote" (which doesn't seem to be a word: https://www.merriam-webster.com/dictionary/annote) should be changed to "annotation" (https://www.merriam-webster.com/dictionary/annotation).

Originally posted by @rmzelle in citation-style-language/documentation#73 (comment)

Yes, annotation is a much better name for the variable.

But wouldn't that be backwards incompatible, and therefore something for a later version?

On the other hand, I don't know how severe that would be. Changing the styles to use the new variable should be trivial, and GUI applications should just change how they map their own fields to CSL variables. (CSL does not know aliases for variables, right?)

add more international identifiers?

Currently CSL has two types of ISO identifiers: ISSN and ISBN. It also includes major identifiers used in citations: DOI, PMCID, PMID.

There are several other ISO identifiers that are analogous to ISSN and ISBN. These appear to be important for citation and cataloging in fields like musicology (e.g., zotero/zotero-bits#28).

They are:

ISWC (for a musical work itself)
ISMN (for a musical notation/score)
ISRC (for an audio or visual recording)
ISAN (for an audiovisual work itself)
ISCI (an identifier for an archival collection)

(Page 25 here is helpful: https://www.ismn-international.org/files/Web_ISMN_Users_Manual_2016.pdf)

Should we add these other 5 identifier variables?

New condition: Test for equality

Another new condition we might want to add later: Test if variable a is equal to variable b.

<if variable="bookauthor" equals="author">
</if>

add condition: year < 2000

As per @georgd here:

At somoe places, years should be formtatted with two digits if year < 2000 and with four digits otherwise. Am I right in assuming that that’s not possible? At least I haven’t found it in the documentation. It’s a requirement that I met in various (legal and non-legal) styles already, especially when dealing with EU sources.

I think that sound like a reasonable suggestion. Other opinions about this? Anyone else has seen that requirement before?
If we add this: Perhaps it should be done in a not so specific way. Rather than if after-2000 perhaps something like "if date-range="1900-1999" which would be adaptable to other requirements.

add `citation-label` style syntax

Currently, citation-label is either provided in item metadata or generated using a pattern specified by the citation processor. This makes it difficult for users to change label formats for different label CSL styles.

Here is some discussion between @cormacrelf and me on an interface for specifying citation-label formats: https://discourse.citationstyles.org/t/citation-label-formatting/1585/2

citeproc-js provides this format for automatically-generated labels: Aaaa00:AaAa00:AaAA00:AAAA00

@cormacrelf implemented that format into citeproc-rs, generalizing it to any number of authors.

I think this is a good format to build on. The one question is how to specify what variables to use/fallback to to produce the citation-label. Here is what I propose.

A new element cs:citation-label sibling to cs:citation with a required format attribute (we could also use value).

format will look something like this format="Aaaa0:AaAa0:AaAA0:AAAA0"
- format is a colon-delimited list with the format to use for an item with names input that is length 1, length 2, etc. The last value is for its position or more names (e.g., here for 4+ names).
- format can be any length
- A indicates first letter of creator’s family name
- a indicates subsequent letters of that creator’s family name
- 0 indicates the output of the date element
- Other characters are treated literally, Literal A, a, 0 can be given by escaping with \ (e.g., \A)

cs:citation-label can have two children

A <names> element used to provide the input for the Aa parts of the format
- Can contain a substitute child. Other cs:names children are ignored.
An optional <date> element to provide the input for the 0 parts of the format
- Output is inserted into the label
- Can take all of the normal children of cs:date
- e.g., for two digits, specify <date-part name="year" form="short"/>

If the item data contains a citation-label value, this overrides the automatically-generated value from cs:citation-label.

This approach rests on existing cs:names and cs:date structures while still being flexible for users.

Thoughts?

Testing for date-parts

The ability to test for the presence of specific date parts (e.g., for a month or day) comes up relatively frequently. Without this capability, some styles produce some weird output. For example, Chicago (author-date) shows both the year and the full date. If month and day are not provided, then this results in only the year being shown twice, which is entirely redundant. See here for a discussion: citation-style-language/styles#3638 (comment)

It seems like it would be fairly easy to add the option to test for specific date parts. I can see two options:

Add issued-year, issued-month, accessed-year, etc. as separate variables that can be tested for in if and else-if.
Add date-part="year month day" as a new attribute for if and else-if that, when specified along with a date variable in variable= will test for the presence of the relevant date part(s) in the specified date variable(s).

Number 1 seems like an easier and more consistent approach. I think that it would be good to still require all date parts to be rendered using date.

define versioning process for styles, schema, etc.

Creating new versions of the schema and spec is technically straightforward.

But how are we thinking to deal with style versioning related to the above, given the thousands of styles we have to maintain?

Will a 1.1 release require completely new style files, that we maintain in parallel with 1.0 styles?

What about 2.0?

@rmzelle suggested here that master would be the current schema, and we'd branch for earlier style versions.

That makes sense, but I think the below is still relevant.

Could we possibly avoid or minimize the need for separate branches?

Maybe we could add a compatibility section to the spec so we ahead of time make this much easier for us to manage?

This issue should settle and describe whatever the strategy will be.

test for date parts

For ISO 690 is important to be able to test if the only year is filled in the date field.
https://forums.zotero.org/discussion/29298/the-conditioned-date-display

add "archival-collection" item type

See zotero/zotero-bits#27

Accepted/in print flag and term

A lot of journals allow citing the documents (typically articles/conference papers), which are accepted for publication but still not published. We need the term for these cases (something like "in press"/"in print"/"accepted for publication") and the flag to use this term.
Now, the text like "in print" can be filled in the date field in the Zotero, but if the guide for authors describes another term, then there is no way to change it by the CSL processor. In CSL 1.01. is similar construction is-uncertain-date.

Retire repository

@bdarcus Can you retire/archive this respository.

Figure out development process

(work-in-progress)

One big reason CSL development has been at a standstill for the last few years is that we don't have a good open and transparent development process. I would like to figure out the following (we can create additional issues here to discuss these things in more detail):

We need to figure out communication and flow for making changes to CSL. What does it mean to have consensus, and how do we prevent information overload for people who can't follow everything? We can't have everybody track every CSL repository (e.g. "styles" is way too busy), and even just "schema" and "documentation" might be too much for people outside the core team. The mailing list sucks (hard to search and archive, things go off-track easily, hard to show code, etc.). I think we need something like:
- Core team creates and/or triages incoming issues on CSL development. I think we should centralize everything into this issue tracker, linking to issues in other repos as appropriate, so that it's enough for non-core folks to keep an eye on this repo.
- Core team make sure issues contain minimum amount of information to support the request for change. I.e., it's really helpful if we determine early on whether there e.g. are style guides that describe certain requirements we currently don't yet support.
- Once the need has been established, we can start coming up with one or more solutions.
- Around this point, we should consult a broader audience, in particular active CSL processor devs, on whether the issue is important enough to address, which solutions are workable, and which one is best.
- Once some sort of consensus has been achieved, pull requests to the specification and schema can be prepared, reviewed, and merged.
- Close to release/feature freeze, we should make sure that the total set of changes is consistent, and have another final call for feedback.
I really think we should switch to a more structured proposal model. Bruce wrote up a template (which I need to dig up), and Apple provides a very nice example with Swift at https://github.com/apple/swift-evolution.
- See https://github.com/apple/swift-evolution/tree/master/proposals for Swift proposals.
- See https://swift.org/blog/swift-3-1-release-process/ for Swift's development process.
We need to make an inventory of what tools need to be updated for a new CSL release, and determine how much time we need to allocate for this, and who will do what. We have our internal tools (validator, visual editor, Travis CI, formatter, etc.) and external ones (CSL processors). We should at least have our own tools updated before we release.
How to best keep the changelog.
- Currently, we have one changelog per release (e.g. https://github.com/citation-style-language/documentation/blob/1.0.1/release-notes.txt). Do we want to continue this (starting fresh with each release), or should we switch to a cumulative changelog?
- Do we need to make any changes to the format? http://keepachangelog.com/en/0.3.0/ has some interesting pointers (e.g. clearly indicating addition, deletions, changes, etc.)
- Shall we keep the changelog up to date during development? In the past I wrote it close to release, but that's less transparent.
We should see if we can make use of any of GitHub's features, that we haven't used in the past (project boards, milestones, etc.).
We need to figure out how to work unit tests into our workflow. Maybe we can add tests to the proposal folders in this repo?

There might be more. I would really like to work this out a bit more, and e.g. write out a development workflow we can follow.

adopt csl-m `alternative`?

CSL-M has a special variable alternative: https://citeproc-js.readthedocs.io/en/latest/csl-m/#cs-alternative-extension

This can be used to "to add supplementary reference information to a cited item, such as a translation or reprint." The idea is that you can have prefix ordinary variables with alt-. Some of those are available as normal variables, but you can also render those with:

<alternative prefix="（" suffix="）">
  <alternative-text/>
</alternative>

Should we adopt this?

Some instances of outdated spec text

There are a number of places where the specification is unclear about or inconsistent with what people are expected to do, and I think it would be a good idea to canonicalize those deviations.

the significance of the order of cs:label in cs:names: see this post
the meaning of genre and medium: see this post
descriptions for the variables container & original-author
clarification that a variable suppressed by cs:substitute is considered empty for the purposes of determining whether to suppress the enclosing cs:group (that's what citeproc-js does, anyway)

test

test...

Evolution process

Every proposal is in a separate markdown file and follows a template (details tbd) but based on https://docs.google.com/document/d/1GTTOl0_Yj9JidrTmOI_pXiKGUqpenFERTP7S4lIBbwo/edit
Files should be consecutively numbered (001, 002, 003, etc.) and contain a short summary. E.g. 003-add-standards-item-type.md
Every proposal has an issue linked to it, which is used for discussion
Every issue has a milestone assigning it to a release
Proposed changes to the proposal should be in the form of a pull request
Before a CSL release is issued, there will be a four week comments period.

add fields to to `name` data model, new `names` behavior

Following up on https://discourse.citationstyles.org/t/extensions-to-name-data-label-or-alternate-name-role/1609/4

I propose three additions to the name data model used in CSL-JSON and accessible in styles:

Changes for 1.1

alternate: For things like names in alternative scripts, online usernames, inferred authors
label: To override the default label used for a name
1. This would permit complex descriptive role information used in styles like APA and MLA
is-uncertain: Indicates that a name is uncertain or inferred.
1. This would be testable using is-uncertain-name (analogous to is-uncertain-date)

Changes for 2.0?

APA has distinct styling rules for personal/individual versus non-personal/organizational authors. In-text citations for individual authors are always short-form. Group authors should be long form with short form in parentheses the first time the name is used, then short form there after (e.g., (American Psychological Association [APA], 2019) first, but (APA, 2010) subsequently). For that, I suggest:

Add short to the name data model
1. A short form of the name for single-field/non-personal names (e.g., "APA" for "American Psychological Association")
Add individual and organization as children of name providing separate formatting for individual and organizational authors.
Allow form="short" for organizational authors.
Add form-subsequent to permit different first and subsequent formats for appearance of the name.
Add form="long-short" to render a non-personal name as Long Name [Short Name].
Update the spec to indicate that personal names with only one part (e.g., Plato) should be entered in "family", not like non-personal names.

Edit: Move is-uncertain here.

make style ids immutable

We should resolve this soon.

My thinking is here, which means no schema change; just documentation, etc.

But I have not followed that thread, so may have missed something.

We could also tighten the constraints on the id element in 1.1. Not sure that's necessary though.

add "addendum" and "howpublished" free-text annotation variables

Biblatex has a couple of fields that can be used to add arbitrary pieces of information to an item: howpublished can be used to supply publishing information in a free format, addendum is used for the same but printed at the end of a reference. When I've used biblatex in the past these fields have been very useful as there will always be cases that can't be cited properly otherwise.

Information given in that field can't be modified afterwards, of course. But I don't think that this is problem as using such mechanisms should be the last resort anyway.

I don't think there will be much impact on existing styles. We could either add those fields to the schema and not use them in the standard styles. Or, we could instruct processors to always print addendum at the end of each reference.

deprecate `container`

I am not sure what the intended function of container is, but it seems clearly misclassified as a date variable. I don't see its value alongside the various container- variables. I suggest it be deprecated and eventually removed.

(Mentioned here: #9)

Wiki

@rmzelle Can you make a Wiki for csl-evolution as a central location for putting consensus changes to CSL and/or the Zotero data model?

support journal special issues

See: zotero/zotero-bits#36 (comment)

We should somehow support journal special issues, but the exact details of what that would entail for CSL are still unclear.

In the original issue, @adam3smith gave the following examples (that should probably be updated according to current style guides):

**Specifically:
APA:
Beasley, E. (Ed.). (2001). The new logic [Special issue]. Journal of Contemporary
Philosophy, 9(6).

MLA:
Burgess, Anthony. “Politics in the Novels of Graham Greene.” Literature and Society. Spec. issue of Journal of Contemporary History 2.2 (1967): 93-99. Print.

CMoS:
Good, Thomas L., ed. “Non-Subject-Matter Outcomes of Schooling.” Special issue, Elementary School Journal 99, no. 5 (1999).

CMoS for an article in a special issue:
Sassler, Sharon, “Learning to Be an ‘American Lady’? Ethnic Variation in Daughters’ Pursuits in the Early 1900s,” in “Emergent and Reconfigured Forms of Family Life,” ed. Lora Bex Lempert and Marjorie L. DeVault, special issue, Gender and Society 14, no. 1 (2000): 201–202.

Journal of Marketing for an Article in a special issue:
Simonson, Itamar, Allen M. Weiss, and Shantanu Dutta (1999), “Marketing in Technology-
Intensive Markets: Toward a Conceptual Framework,” Journal of Marketing, 63
(Special Issue), 78–91.**

add "periodical" item type

See zotero/zotero-bits#23

get rid of `choose`

I think we should get rid of choose elements. They are unnecessary and removing them would do no harm. Instead, styles would be more readable. (We could still leave choose optional until 2.0. to make the transition easier.)

add generic type `document`

We don't have a generic item type at the moment. Should we add one? @bdarcus @bwiernik

Dash-normalization

In the issue about splitting title-main and title-sub, one question was about em vs en dashes.
Should we add an option to normalize dashes in a textual context as we already change hyphens to en-dashes in a numerical context (we do that, right?).
I guess that should, most likely, be locale depended, e.g., em dashes for US English, en dashes for most other locales?

add support for in-text author or "narrative" citation configuration

Brief Abstract

Arguably the biggest outstanding feature request for CSL is to support in-text "narrative" citations, where the author is moved outside the citation proper. Examples: "Doe (2018)" and "Doe [1]".

In the LaTeX world, this feature is represented in the natbib \citet command.

Style Requirements

Beyond the general support for the feature requested by a lot of users, APA requires different author rendering when multiple authors are within parenthesis compared to outside of them.

Examples from APA (note the difference in how the "and" is represented in each):

Research by Wegener and Petty (1994) supported...
(Wegener & Petty, 1994)

Level of Current CSL Support

Partial.

It is possible to build on the suppress author functionality to offer simple author-in-text cases in CSL 1.0. Indeed, pandoc-citeproc already does this.

Most CSL implementations, however, require authors to manually write the author name in-text, and then to suppress the author output. Many authors consider this a hassle and prone to errors.

But it is not possible with CSL 1.0 to support any case where the author rendering within the citation is different than when its outside of it (as with the APA example above).

Implementation

Suggested

We could add a new cs:intext element, a sibling of cs:citation with the same RNC pattern, to configure the rendering of the intext author rendering, and then use that output to assemble the narrative citation.

Alternatives

There are other ways to configure this (multiple cs:layout elements per citation, for example), but they introduce other problems that in the end would suggest more invasive CSL changes.

Compatibility Implications

Processors could ignore the new element, but would have to add new code to support it. Such new code should be relatively uncomplicated, as it would build on existing functionality (like suppress author and substitution).

Some styles (those where the author rendering is different outside of the citation) would need to be updated.