About the Project

HUDOC Researcher — search, analytics, and open data for human rights research

Authors

Łukasz Szoszkiewicz

Adam Mickiewicz University in Poznan

Sebastian Marcisz

Adam Mickiewicz University in Poznan

What Is This?

HUDOC Researcher is a paragraph-level companion to the European Court of Human Rights. It indexes every visible paragraph across 19,720 judgments and lets researchers find the precise passage they need — then jumps them straight into HUDOC for the authoritative full text.

A typical workflow:

  1. Search a phrase or concept (e.g. "reasonable suspicion", article:8 "right to be forgotten") across the whole corpus.
  2. Filter by the part of the judgment you care about — Facts, Admissibility + Merits, Just Satisfaction, Operative Part, or Individual Opinions.
  3. Refine with state, Convention article, importance, date, outcome.
  4. Each hit shows the paragraph number, section label, and a one-click HUDOC ↗ link that opens the case in the Court's official viewer at the right paragraph anchor.
  5. Use Cite ¶ to copy a ready-to-paste citation in the form Smith v. Croatia, App. no. 12345/05, § 47 (ECtHR 2024).

The dashboard does not try to re-render the full text of each judgment — HUDOC is and remains the authoritative source. What we add is findability across the corpus at paragraph granularity, which HUDOC does not expose.

Free, open-source, paragraph-level search + analytics for ECHR case law. The goal: make Strasbourg jurisprudence more findable for researchers, legal practitioners, students, and anyone working on human rights in Europe.

Scope and Limits (v1)

  • Corpus: 19,720 ECHR judgments — every Court / Grand Chamber decision available from HUDOC's public DOCX endpoint as of the last data refresh. Press releases and inadmissibility decisions are excluded.
  • Segmentation: paragraphs are extracted with their HUDOC paragraph numbers preserved, plus a section label (Facts / Admissibility / Merits / Just Satisfaction / Operative / Separate Opinion / Appendix). Quotes, headings and operative-list items are tagged separately so the search can filter them in or out.
  • Where we differ from HUDOC: HUDOC searches whole judgments; we search individual paragraphs. HUDOC renders the canonical text; we always link you back to HUDOC for that. We do not host PDFs, audio, language versions, or executions data — HUDOC does that better.
  • Known caveats: very old (pre-1995) Court and Commission judgments use a different DOCX template; segmentation is correct on a stratified 50-case battery but edge cases (e.g. multi-applicant pilot judgments with 12 000+ applicant rows like Burmych and Others v. Ukraine) may show occasional drift. For canonical work always cross-check against HUDOC.

How the Data Is Built

Each judgment is fetched from HUDOC as a DOCX file, parsed into numbered paragraphs, and segmented into sections (Facts / Admissibility + Merits / Just Satisfaction / Operative / Separate Opinion / Appendix). Headings, quotes and operative-list items are tagged so search can filter them.

Segmentation accuracy is checked with a recurring LLM-as-judge audit: a large language model reviews a stratified sample of ~1,000 paragraphs against the assigned section labels, and any systematic errors it surfaces are corrected by targeted, reviewable cleanup passes. The most recent audit puts the effective segmentation error rate near 1%. For canonical work always cross-check against HUDOC.

Data Sources

The dataset is assembled from public ECHR sources and enriched with structured citation-network data from the research community.

Source Description Type
HUDOC Official database of the European Court of Human Rights. Primary source for all judgments, metadata (respondent state, articles, conclusions, importance, keywords, chamber composition), and full-text paragraphs with section tags. Primary
ECTHR-PCR Prior Case Retrieval dataset by Rashid Haddad et al. (TUM Legal Tech). Provides structured citation networks mapping each case to its cited precedents (15,729 cases). Used to enrich the pcr_citations and pcr_cited_by fields in our dataset, enabling precedent-graph analysis that goes beyond the free-text strasbourg_caselaw field from HUDOC.
Reference: Haddad, R., Bayer, S. and Habernal, I. (2024). "ECHR-PCR: A Dataset for Precedent Understanding and Prior Case Retrieval in the European Court of Human Rights." Proceedings of LREC-COLING 2024.
Enrichment

Dataset Fields

Each case record contains the following structured data:

FieldDescription
case_idUnique HUDOC identifier (e.g. 001-57516)
case_noApplication number(s)
titleFull case name
judgment_dateDate of judgment
respondent_stateRespondent country
originating_bodyChamber, Grand Chamber, etc.
importanceCase importance level
article_noConvention articles at issue
violation / non-violationArticles found violated or not
conclusionOperative conclusion text
separate_opinionWhether the case includes a separate opinion
keywordsHUDOC subject keywords
chamber_composed_ofNames of judges
strasbourg_caselawFree-text citations to Strasbourg case law
paragraphsFull text, section-tagged at paragraph level
pcr_citationsStructured precedent links (from ECTHR-PCR + self-resolved)
pcr_cited_byCases that cite this one (from ECTHR-PCR)
pcr_citation_countNumber of cited precedents
pcr_cited_by_countInfluence score (how often cited)
resolved_citationsSelf-resolved citations with appno, title, and original text
resolved_citation_countNumber of successfully resolved citations
resolution_rateFraction of free-text citations that were resolved

Methodology

Judgments are sourced from HUDOC and parsed into paragraph-level records with section tags (Facts, Legal Framework, Merits, etc.). Violation and non-violation findings are extracted from the structured HUDOC metadata where available and supplemented by text-based inference where metadata is incomplete.

Citation-network enrichment uses a two-stage pipeline:

  • Stage 1 — ECTHR-PCR merge (scripts/merge_ecthr_pcr.py): joins the ECTHR-PCR dataset on application number, adding structured forward and reverse citation links for 12,500+ cases.
  • Stage 2 — Self-resolution (scripts/resolve_citations.py): parses the free-text strasbourg_caselaw field to extract case names and application numbers, then resolves them against a lookup index built from the dataset itself. This extends citation coverage to 760+ additional cases (mostly post-2022) not in ECTHR-PCR. Validated against PCR ground truth: 94% precision, 79% recall, 86% F1.

Press releases are identified by document_type and excluded from all judgment-related statistics (case counts, violation rates, article breakdowns, country rankings). They are tracked separately for completeness.

Open Source

The source code, build scripts, and dataset are available on GitHub:

github.com/lszoszk/ECHR-Dashboard

Contributions, bug reports, and suggestions are welcome.

Acknowledgements