About — HUDOC Researcher

Authors

Łukasz Szoszkiewicz

Adam Mickiewicz University in Poznan

Sebastian Marcisz

Adam Mickiewicz University in Poznan

What Is This?

HUDOC Researcher is a paragraph-level companion to the European Court of Human Rights. It indexes every visible paragraph across 19,822 judgments (as of May 2026) and lets researchers find the precise passage they need — then jumps them straight into HUDOC for the authoritative full text.

A typical workflow:

Search a phrase or concept (e.g. "reasonable suspicion", article:8 "right to be forgotten") across the whole corpus.
Filter by the part of the judgment you care about — Facts, Admissibility + Merits, Just Satisfaction, Operative Part, or Individual Opinions.
Refine with state, Convention article, importance, date, outcome.
Each hit shows the paragraph number, section label, and a one-click HUDOC ↗ link that opens the case in the Court's official viewer at the right paragraph anchor.
Use Cite ¶ to copy a ready-to-paste citation in the form Smith v. Croatia, App. no. 12345/05, § 47 (ECtHR 2024).

The dashboard does not try to re-render the full text of each judgment — HUDOC is and remains the authoritative source. What we add is findability across the corpus at paragraph granularity, which HUDOC does not expose.

Free, open-source, paragraph-level search + analytics for ECHR case law. The goal: make Strasbourg jurisprudence more findable for researchers, legal practitioners, students, and anyone working on human rights in Europe.

Scope and Limits (v1)

Corpus: 19,822 ECHR judgments (as of 13 May 2026) — Grand Chamber, Chamber and Committee judgments from HUDOC's public DOCX endpoint. Note that HUDOC's default search covers Grand Chamber + Chamber only, so counts here can legitimately exceed a default HUDOC search.
Judgments only: HUDOC's other collections — admissibility Decisions, Communicated Cases, Legal Summaries, Advisory Opinions, Commission decisions — are not included. A landmark decision (e.g. Banković) returns no results here; press releases are excluded as duplicates of the underlying judgments.
English texts only: judgments delivered only in French are not yet ingested. On Semantic Search, “describe your case in any language” refers to the query — results are always the English texts.
No judge filter yet: HUDOC can filter by judge; bench-composition metadata is not ingested here. Planned.
Segmentation: paragraphs are extracted with their HUDOC paragraph numbers preserved, plus a section label (Facts / Admissibility / Merits / Just Satisfaction / Operative / Separate Opinion / Appendix). Quotes, headings and operative-list items are tagged separately so the search can filter them in or out.
Where we differ from HUDOC: HUDOC searches whole judgments; we search individual paragraphs. HUDOC renders the canonical text; we always link you back to HUDOC for that. We do not host PDFs, audio, language versions, or executions data — HUDOC does that better.
Known caveats: very old (pre-1995) Court and Commission judgments use a different DOCX template; segmentation is correct on a stratified 50-case battery but edge cases (e.g. multi-applicant pilot judgments with 12 000+ applicant rows like Burmych and Others v. Ukraine) may show occasional drift. For canonical work always cross-check against HUDOC.

How the Data Is Built

Each judgment is fetched from HUDOC as a DOCX file, parsed into numbered paragraphs, and segmented into sections (Facts / Admissibility + Merits / Just Satisfaction / Operative / Separate Opinion / Appendix). Headings, quotes and operative-list items are tagged so search can filter them.

Segmentation accuracy is checked with a recurring LLM-as-judge audit: a large language model reviews a stratified sample of ~1,000 paragraphs against the assigned section labels, and any systematic errors it surfaces are corrected by targeted, reviewable cleanup passes. The most recent audit puts the effective segmentation error rate near 1%. For canonical work always cross-check against HUDOC.

Data Sources

The dataset is assembled from public ECHR sources and enriched with structured citation-network data from the research community.

Source	Description	Type
HUDOC	Official database of the European Court of Human Rights. Primary source for all judgments, metadata (respondent state, articles, conclusions, importance, keywords, chamber composition), and full-text paragraphs with section tags.	Primary
ECTHR-PCR	Prior Case Retrieval dataset by Rashid Haddad et al. (TUM Legal Tech). Provides structured citation networks mapping each case to its cited precedents (15,729 cases). Used to enrich the `pcr_citations` and `pcr_cited_by` fields in our dataset, enabling precedent-graph analysis that goes beyond the free-text `strasbourg_caselaw` field from HUDOC. Reference: Haddad, R., Bayer, S. and Habernal, I. (2024). "ECHR-PCR: A Dataset for Precedent Understanding and Prior Case Retrieval in the European Court of Human Rights." Proceedings of LREC-COLING 2024.	Enrichment

Dataset Fields

Each case record contains the following structured data:

Field	Description
`case_id`	Unique HUDOC identifier (e.g. 001-57516)
`case_no`	Application number(s)
`title`	Full case name
`judgment_date`	Date of judgment
`respondent_state`	Respondent country
`originating_body`	Chamber, Grand Chamber, etc.
`importance`	Case importance level
`article_no`	Convention articles at issue
`violation` / `non-violation`	Articles found violated or not
`conclusion`	Operative conclusion text
`separate_opinion`	Whether the case includes a separate opinion
`keywords`	HUDOC subject keywords
`chamber_composed_of`	Names of judges
`strasbourg_caselaw`	Free-text citations to Strasbourg case law
`paragraphs`	Full text, section-tagged at paragraph level
`pcr_citations`	Structured precedent links (from ECTHR-PCR + self-resolved)
`pcr_cited_by`	Cases that cite this one (from ECTHR-PCR)
`pcr_citation_count`	Number of cited precedents
`pcr_cited_by_count`	Influence score (how often cited)
`resolved_citations`	Self-resolved citations with appno, title, and original text
`resolved_citation_count`	Number of successfully resolved citations
`resolution_rate`	Fraction of free-text citations that were resolved

Methodology

Judgments are sourced from HUDOC and parsed into paragraph-level records with section tags (Facts, Legal Framework, Merits, etc.). Violation and non-violation findings are extracted from the structured HUDOC metadata where available and supplemented by text-based inference where metadata is incomplete.

Citation-network enrichment uses a two-stage pipeline:

Stage 1 — ECTHR-PCR merge (scripts/merge_ecthr_pcr.py): joins the ECTHR-PCR dataset on application number, adding structured forward and reverse citation links for 12,500+ cases.
Stage 2 — Self-resolution (scripts/resolve_citations.py): parses the free-text strasbourg_caselaw field to extract case names and application numbers, then resolves them against a lookup index built from the dataset itself. This extends citation coverage to 760+ additional cases (mostly post-2022) not in ECTHR-PCR. Validated against PCR ground truth: 94% precision, 79% recall, 86% F1.

Press releases are identified by document_type and excluded from all judgment-related statistics (case counts, violation rates, article breakdowns, country rankings). They are tracked separately for completeness.

Open Source

The source code, build scripts, and dataset are available on GitHub:

github.com/lszoszk/ECHR-Dashboard

Contributions, bug reports, and suggestions are welcome.

Acknowledgements

HUDOC — the European Court of Human Rights for making case law publicly accessible.
ECTHR-PCR — Rashid Haddad, Sean Bayer, and Ivan Habernal (TUM Legal Tech) for the structured citation-network dataset.
HURIDOCS — for inspiration and leadership in human rights information management.

About the Project