This is the main repository for the [Qualidata project](http://www.opendatafrance.net/outil-de-qualification-des-donnees-ouvertes-qualidata/)(FR). The goal of Qualidata is to give users feedback on the _validity_ of datasets they produce in order to help them increase their quality.
_Validity_ as understood in this project means:
- Absence of general errors in file structure or content,
- Conformity to a data schema (e.g. within the [French Socle commun des données locales](http://opendatalocale.net/scdl/)).
In this project we intend to rely on well-known communities and existing projects based on state of the art technologies. In particular, we share [Frictionless Data's vision](https://frictionlessdata.io/specs/) and choose to use some of its technical building blocks (i.e. [Good Tables](http://goodtables.io/), [Data Package](https://frictionlessdata.io/data-packages/), etc.).
Here are some of the principles we adhere to:
- Manage data in Git repositories with native versioning,
- Rely on continuous integration to validate or transform data,
- Add metadata to describe the datasets and their schema.
## Iteration 0
The deliverable released here is a script which takes a CSV file and a JSON file in input and outputs a potential list of errors in a terminal.
The JSON file must be a Table Schema.
For example, let's validate a file containing _prénoms_ (first names) as published in open data by the _commune_ of Digne-les-Bains for 2017, and using [the relevant JSON schema](https://github.com/CharlesNepote/liste-prenoms-nouveaux-nes/) as created by Charles Nepote (based on [OpenData France's work](http://opendatalocale.net/wp-content/uploads/2018/02/3.7-Sp%C3%A9cifications-SCDL-Pr%C3%A9noms-des-nouveaux-n%C3%A9s.pdf)).
samples/DIGNE-PRENOMS-2017.csv:135:4: The value "Lawai'a" does not conform to the "pattern" constraint for column "ENFANT_PRENOM"
```
### Specifications & libraries
-[Table Schema](http://specs.frictionlessdata.io/table-schema/): A JSON file describing a CSV file.
-[tableschema-js](https://github.com/frictionlessdata/tableschema-js): A JavaScript library that implements Table Schema specification.
### Install
## Install
Required software:
...
...
@@ -53,10 +22,3 @@ Install dependencies:
```sh
npm install
```
## What we learned & what we want to do next
- Tableschema-js outputs low level errors as strings.
-[Good Tables UI](https://frictionlessdata.github.io/goodtables-ui/) displays errors in context of the CSV file. Next step would be to explore this and get the same results as our current script but with this UI.
- Then we want the end user to be able to select a schema in a dropdown list (e.g. "Schéma prénoms").
- Also, stakeholders (Etalab, OpenData France, la Fing...) should be onboarded on repositories (under [git.opendatafrance.net/scdl](https://git.opendatafrance.net/scdl) dedicated to each JSON schema to converge towards a common vision. The end goal is to build shared resources (reference schemas of the SCDL) with stable URLs.