Commit bef83721 authored by johanricher's avatar johanricher

updated README

parent fa90f3ac
Pipeline #4 failed with stage
# Qualidata
# Qualidata - Iteration 0
## Project
[Wiki](https://git.opendatafrance.net/qualidata/qualidata-ui/wikis/home)
This is the main repository for the [Qualidata project](http://www.opendatafrance.net/outil-de-qualification-des-donnees-ouvertes-qualidata/) (FR). The goal of Qualidata is to give users feedback on the _validity_ of datasets they produce in order to help them increase their quality.
_Validity_ as understood in this project means:
- Absence of general errors in file structure or content,
- Conformity to a data schema (e.g. within the [French Socle commun des données locales](http://opendatalocale.net/scdl/)).
In this project we intend to rely on well-known communities and existing projects based on state of the art technologies. In particular, we share [Frictionless Data's vision](https://frictionlessdata.io/specs/) and choose to use some of its technical building blocks (i.e. [Good Tables](http://goodtables.io/), [Data Package](https://frictionlessdata.io/data-packages/), etc.).
Here are some of the principles we adhere to:
- Manage data in Git repositories with native versioning,
- Rely on continuous integration to validate or transform data,
- Add metadata to describe the datasets and their schema.
## Iteration 0
The deliverable released here is a script which takes a CSV file and a JSON file in input and outputs a potential list of errors in a terminal.
The JSON file must be a Table Schema.
For example, let's validate a file containing _prénoms_ (first names) as published in open data by the _commune_ of Digne-les-Bains for 2017, and using [the relevant JSON schema](https://github.com/CharlesNepote/liste-prenoms-nouveaux-nes/) as created by Charles Nepote (based on [OpenData France's work](http://opendatalocale.net/wp-content/uploads/2018/02/3.7-Sp%C3%A9cifications-SCDL-Pr%C3%A9noms-des-nouveaux-n%C3%A9s.pdf)).
```
npm start -- samples/DIGNE-PRENOMS-2017.csv --schema schemas/prenom-schema.json
samples/DIGNE-PRENOMS-2017.csv:135:4: The value "Lawai'a" does not conform to the "pattern" constraint for column "ENFANT_PRENOM"
```
### Specifications & libraries
- [Table Schema](http://specs.frictionlessdata.io/table-schema/): A JSON file describing a CSV file.
- [tableschema-js](https://github.com/frictionlessdata/tableschema-js): A JavaScript library that implements Table Schema specification.
### Install
## Install
Required software:
......@@ -53,10 +22,3 @@ Install dependencies:
```sh
npm install
```
## What we learned & what we want to do next
- Tableschema-js outputs low level errors as strings.
- [Good Tables UI](https://frictionlessdata.github.io/goodtables-ui/) displays errors in context of the CSV file. Next step would be to explore this and get the same results as our current script but with this UI.
- Then we want the end user to be able to select a schema in a dropdown list (e.g. "Schéma prénoms").
- Also, stakeholders (Etalab, OpenData France, la Fing...) should be onboarded on repositories (under [git.opendatafrance.net/scdl](https://git.opendatafrance.net/scdl) dedicated to each JSON schema to converge towards a common vision. The end goal is to build shared resources (reference schemas of the SCDL) with stable URLs.
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment