europeanuniversityfoundation / eche-api Goto Github PK
View Code? Open in Web Editor NEWECHE API
ECHE API
The API includes fields for verified data but most entries do not have any content there. As such, exposing verified data for all entries is not desirable at this point.
However, it may be useful to indicate whether a given entry has verified data without requiring the inclusion of verified data for all entries. This way, client applications can perform an additional scoped request when desirable.
A new processed API field hasVerifiedData
should return true
when a given entry has any number of non-emtpy verified data fields, or false
otherwise.
Update OpenAPI specification and API documentation.
As of March 2023 significant changes have been introduced in the source ECHE list, especially at the level of the country
data format.
Before: country
values were country names.
After: country
values are country codes based on Annex A6 of the Interinstitutional Style Guide (ISG) of the Publications Office of the European Union.
Given the differences between the ISG country codes and the ISO 3166-1 alpha-2 standard, the API should provide both, as well as the country names as per the ISG.
country
contains the raw value provided in the ECHE list.country
value contains an ISG country code or country name.countryCode
should henceforth contain the ISG country code by:
countryName
should contain the ISG country name by:
countryCodeIso
should contain the corresponding ISO 3166-1 alpha-2 country code.erasmusCodeCountryCode
should henceforth contain the ISG country code.erasmusCodeCountryCodeIso
should contain the ISO 3166-1 alpha-2 country code.Update OpenAPI specification and API documentation.
Currently, there are issues with the data found in the ECHE list related to HEI names and city names:
organisationLegalName
refers to the ECHE holder and not necessarily the HEI; in some cases, the legal entity that holds the ECHE is the owner of the HEI, not the HEI itself;organisationLegalName
sometimes appears as UPPERCASE or with wrong capitalisation when language is taken into consideration;city
is, in fact, part of the postal address, which means it often carries additional information, such as district number and other postal related words (i.e. CEDEX in France);city
sometimes appears in a native language, some other times in a different language (i.e. regional language or English) so the same city may in fact appear in more than one form;Add verified data sources to the application, attach the verified data when available and expose it in the API.
Given the known limitations of the data in the ECHE list and the foreseeable difficulties in collecting information at the individual HEI level, the best option would be to source verified data from either National Agencies or the relevant Ministries.
The verified data should include the HEI name as presented to the public, not necessarily the legal name of the ECHE holder, and the correct spelling of the city name without additional postal indications. If possible, these should be accompanied by an ISO 639-1 language code language code or even a complete IETF language tag when relevant.
Besides the data points to be attached, it is also necessary for the verified data to include some unique identifiers of each HEI so that data can be correctly matched. Ideally both erasmusCode
and pic
, even oid
when available, would provide the ability to match data between sources, correctly and with redundancy. For faster results, a country code can be included as well, so that the matching may occur on a subset of the ECHE list.
The verified data should be attached to the ECHE list data after the existing cleaning operations (so that normalized identifers are available for matching) and before the database is populated.
In order to expose the verified data in addition to the ECHE list data, new API keys will be required, for example:
verifiedName
verifiedNameLang
verifiedCity
verifiedCityLang
The new API keys must be added to the specification as non-required, since it is not guaranteed that such data will exist.
There should be a distinction between the Organisation Legal Name, as published in the ECHE list and used in legally binding documents, and the Display Name, which is a much more useful data point for user facing applications.
For example, while an IIA may be established by THE PROVOST, FELLOWS, FOUNDATION SCHOLARS & THE OTHER MEMBERS OF BOARD, OF THE COLLEGE OF THE HOLY & UNDIVIDED TRINITY OF QUEEN ELIZABETH NEAR DUBLIN, one could argue that Trinity College Dublin is easier to identify in a colloquial setting.
Other example include cases where an Erasmus Charter is awarded to a legal entity that owns an educational institution of a different name, just like EIA - ENSINO E INVESTIGACAO E ADMINISTRACAO SA owns Atlântica - Instituto Universitário.
Because this data cannot be drawn from the ECHE list, it should be an optional component of verified data.
This project could use a clean up, so the following packages are suggested:
flake8
isort
Using a JSON Formatter & Validator online, one gets the following error: Invalid encoding, expecting UTF-8
. There is a suspected byte order mark character at the start of the JSON response which may be the root cause of the problem.
Since the API processed fields are named by appending descriptions to the canonical field names, it is safe to present both canonical and processed fields at the same level. This reduces the complexity of API calls when dealing with canonical and processed fields.
The bandwidth impact is negligible: at the time of this writing, the unfiltered output with canonical fields was 389 kB and the unfiltered output with both canonical and processed fields was 470 kB.
All ECHE list entries are processed, but only a subset will have verified data attached. As such, this flattening proposal does not include verified data fields in the API.
Update OpenAPI specification and API documentation.
The ECHE List with filename 20231220_List_of_Accredited_HEIs_within_the_Erasmus+_Programme_2021-2027_0.xlsx
contains incorrect values in the ECHE Start Date column (where the ECHE End Date is 31-07-2024).
This causes the current processing to produce NaN
entries, which in turn are not being correctly output in JSON, resulting in invalid API output.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.