About the Project
HUDOC Researcher — search, analytics, and open data for human rights research
Authors
What Is This?
HUDOC Researcher is a paragraph-level companion to the European Court of Human Rights. It indexes every visible paragraph across 19,720 judgments and lets researchers find the precise passage they need — then jumps them straight into HUDOC for the authoritative full text.
A typical workflow:
- Search a phrase or concept (e.g. "reasonable suspicion", article:8 "right to be forgotten") across the whole corpus.
- Filter by the part of the judgment you care about — Facts, Admissibility + Merits, Just Satisfaction, Operative Part, or Individual Opinions.
- Refine with state, Convention article, importance, date, outcome.
- Each hit shows the paragraph number, section label, and a one-click HUDOC ↗ link that opens the case in the Court's official viewer at the right paragraph anchor.
- Use Cite ¶ to copy a ready-to-paste citation in the form Smith v. Croatia, App. no. 12345/05, § 47 (ECtHR 2024).
The dashboard does not try to re-render the full text of each judgment — HUDOC is and remains the authoritative source. What we add is findability across the corpus at paragraph granularity, which HUDOC does not expose.
Free, open-source, paragraph-level search + analytics for ECHR case law. The goal: make Strasbourg jurisprudence more findable for researchers, legal practitioners, students, and anyone working on human rights in Europe.
Scope and Limits (v1)
- Corpus: 19,720 ECHR judgments — every Court / Grand Chamber decision available from HUDOC's public DOCX endpoint as of the last data refresh. Press releases and inadmissibility decisions are excluded.
- Segmentation: paragraphs are extracted with their HUDOC paragraph numbers preserved, plus a section label (Facts / Admissibility / Merits / Just Satisfaction / Operative / Separate Opinion / Appendix). Quotes, headings and operative-list items are tagged separately so the search can filter them in or out.
- Where we differ from HUDOC: HUDOC searches whole judgments; we search individual paragraphs. HUDOC renders the canonical text; we always link you back to HUDOC for that. We do not host PDFs, audio, language versions, or executions data — HUDOC does that better.
- Known caveats: very old (pre-1995) Court and Commission judgments use a different DOCX template; segmentation is correct on a stratified 50-case battery but edge cases (e.g. multi-applicant pilot judgments with 12 000+ applicant rows like Burmych and Others v. Ukraine) may show occasional drift. For canonical work always cross-check against HUDOC.
How the Data Is Built
Each judgment is fetched from HUDOC as a DOCX file, parsed into numbered paragraphs, and segmented into sections (Facts / Admissibility + Merits / Just Satisfaction / Operative / Separate Opinion / Appendix). Headings, quotes and operative-list items are tagged so search can filter them.
Segmentation accuracy is checked with a recurring LLM-as-judge audit: a large language model reviews a stratified sample of ~1,000 paragraphs against the assigned section labels, and any systematic errors it surfaces are corrected by targeted, reviewable cleanup passes. The most recent audit puts the effective segmentation error rate near 1%. For canonical work always cross-check against HUDOC.
Data Sources
The dataset is assembled from public ECHR sources and enriched with structured citation-network data from the research community.
| Source | Description | Type |
|---|---|---|
| HUDOC | Official database of the European Court of Human Rights. Primary source for all judgments, metadata (respondent state, articles, conclusions, importance, keywords, chamber composition), and full-text paragraphs with section tags. | Primary |
| ECTHR-PCR |
Prior Case Retrieval dataset by Rashid Haddad et al.
(TUM Legal Tech).
Provides structured citation networks mapping each case to its cited
precedents (15,729 cases). Used to enrich the
pcr_citations and pcr_cited_by fields in
our dataset, enabling precedent-graph analysis that goes beyond the
free-text strasbourg_caselaw field from HUDOC.
Reference: Haddad, R., Bayer, S. and Habernal, I. (2024). "ECHR-PCR: A Dataset for Precedent Understanding and Prior Case Retrieval in the European Court of Human Rights." Proceedings of LREC-COLING 2024. |
Enrichment |
Dataset Fields
Each case record contains the following structured data:
| Field | Description |
|---|---|
case_id | Unique HUDOC identifier (e.g. 001-57516) |
case_no | Application number(s) |
title | Full case name |
judgment_date | Date of judgment |
respondent_state | Respondent country |
originating_body | Chamber, Grand Chamber, etc. |
importance | Case importance level |
article_no | Convention articles at issue |
violation / non-violation | Articles found violated or not |
conclusion | Operative conclusion text |
separate_opinion | Whether the case includes a separate opinion |
keywords | HUDOC subject keywords |
chamber_composed_of | Names of judges |
strasbourg_caselaw | Free-text citations to Strasbourg case law |
paragraphs | Full text, section-tagged at paragraph level |
pcr_citations | Structured precedent links (from ECTHR-PCR + self-resolved) |
pcr_cited_by | Cases that cite this one (from ECTHR-PCR) |
pcr_citation_count | Number of cited precedents |
pcr_cited_by_count | Influence score (how often cited) |
resolved_citations | Self-resolved citations with appno, title, and original text |
resolved_citation_count | Number of successfully resolved citations |
resolution_rate | Fraction of free-text citations that were resolved |
Methodology
Judgments are sourced from HUDOC and parsed into paragraph-level records with section tags (Facts, Legal Framework, Merits, etc.). Violation and non-violation findings are extracted from the structured HUDOC metadata where available and supplemented by text-based inference where metadata is incomplete.
Citation-network enrichment uses a two-stage pipeline:
-
Stage 1 — ECTHR-PCR merge
(
scripts/merge_ecthr_pcr.py): joins the ECTHR-PCR dataset on application number, adding structured forward and reverse citation links for 12,500+ cases. -
Stage 2 — Self-resolution
(
scripts/resolve_citations.py): parses the free-textstrasbourg_caselawfield to extract case names and application numbers, then resolves them against a lookup index built from the dataset itself. This extends citation coverage to 760+ additional cases (mostly post-2022) not in ECTHR-PCR. Validated against PCR ground truth: 94% precision, 79% recall, 86% F1.
Press releases are identified by document_type and excluded
from all judgment-related statistics (case counts, violation rates, article
breakdowns, country rankings). They are tracked separately for
completeness.
Open Source
The source code, build scripts, and dataset are available on GitHub:
github.com/lszoszk/ECHR-Dashboard
Contributions, bug reports, and suggestions are welcome.
Acknowledgements
- HUDOC — the European Court of Human Rights for making case law publicly accessible.
- ECTHR-PCR — Rashid Haddad, Sean Bayer, and Ivan Habernal (TUM Legal Tech) for the structured citation-network dataset.
- HURIDOCS — for inspiration and leadership in human rights information management.