Harshil Darji

Data Scientist & AI Researcher

I am a data scientist working at the intersection of natural language processing and law. My research focuses on building and evaluating models for Legal Named Entity Recognition and Relation Extraction, as well as constructing citation networks and structured datasets for German legal texts. I am particularly interested in how domain-adapted language models can support reliable, privacy-compliant processing of court decisions and legal documents.

Beyond model development, I work on turning research prototypes into usable systems, including anonymization pipelines, similarity search, and graph-based analysis of legal case relationships. My goal is to combine careful data modeling with robust machine learning to make complex legal texts more accessible and analyzable at scale.

Dec 2024 – Present

Data Scientist / KI-Experte

Hochschule für Technik und Wirtschaft Berlin, Berlin, Germany

Developed a citation network of German legal cases using Neo4j, allowing for efficient cross-referencing and analysis of case relationships.
Enhanced the previous Legal NER LLM to automate the anonymization of German legal documents, ensuring compliance with privacy regulations.
Implemented a legal text similarity search as a foundation for a RAG-based system to support more accurate, context-aware legal research.

Sep 2021 – Sep 2024

Data Science Researcher

University of Passau, Passau, Germany

Fine-tuned a German BERT model on the legal dataset, achieving an F1 score of 99.49%, and compiled a dataset of 2944 German legal references.
Co-developed a GDPR-compliant dataset of 44 privacy policies with 33 entity types, enhancing Named Entity Recognition (NER) and Relation Extraction (RE) capabilities for NLP models.
Independently fine-tuned and published LLMs for NER and RE in privacy policies, achieving F1 scores of 74% and 83%, respectively.

Oct 2016 – Jan 2017

Junior Android Developer

Profero Techno Pvt. Ltd., Mumbai, India

Developed an Android application to help local businesses promote sales with hourly discounts, integrating Firebase for the real-time database.
Wrote REST web services to support Android and iOS versions and built a business interface for managing discounts independently.

2025 – JURIX Legal Knowledge and Information Systems

Segmentation and Processing of German Court Decisions from Open Legal Data

Harshil Darji, Martin Heckelmann, Christina Kratsch, Gerard de Melo
2024 – OSSYM Open Search Symposium

A dataset of GDPR compliant NER for privacy policies

Harshil Darji, Stefan Becher, Jelena Mitrović, Armin Gerl, Michael Granitzer
2024 – JURIX Legal Knowledge and Information Systems

Challenges and Considerations in Annotating Legal Data: A Comprehensive Overview

Harshil Darji, Jelena Mitrović, Michael Granitzer
2023 – ICAIL International Conference on Artificial Intelligence and Law

A Dataset of German Legal Reference Annotations

Harshil Darji, Jelena Mitrović, Michael Granitzer
2023 – ICAART International Conference on Agents and Artificial Intelligence

German BERT Model for Legal Named Entity Recognition

Harshil Darji, Jelena Mitrović, Michael Granitzer
2021 – KDIR International Joint Conference on Knowledge Discovery, Knowledge Engineering, and Knowledge Management

Exploring Semantic Similarity Between German Legal Texts and Referred Laws

Harshil Darji, Jelena Mitrović, Michael Granitzer
2021 – LOD International Conference on Machine Learning, Optimization, and Data Science

Experiments on Properties of Hidden Structures of Sparse Neural Networks

Julian Stier, Harshil Darji, Michael Granitzer

2018 – 2021

MS Computer Science

University of Passau, Passau, Germany

Thesis: Investigating Sparsity in Recurrent Neural Networks

2012 – 2016

BS Information Technology

Gujarat Technological University, Gujarat, India

Thesis: Steganography

Demos

Models

JuraNER