JMdict

Japanese Multi-lingual Electronic Dictionary Project

Introduction

This document outlines the JMdict project, which set out to extend the structure and content of the EDICT Japanese-English Electronic Dictionary file to enable it to contain additional information and provided an improved service to users.

Project Goals

The project has several broad goals:

  1. to convert the EDICT file to a new dictionary structure which overcomes the deficiencies in the basic EDICT structure. (completed)

    With regard to this goal, the particular structural and content aspects addressed include, but are not limited to:

    1. the handling of orthographical variation (e.g. in kanji usage, okurigana usage, readings) within the single entry;
    2. additional and more appropriately associated tagging of grammatical and other information;
    3. provision for separation of different senses (polysemy) in the translations;
    4. provision for the inclusion of translational equivalents from several languages;
    5. provision for inclusion of examples of the usage of words;
    6. provision for cross-references to related entries.

  2. to publish the dictionary in a standard format which is accessible by a wide range of software tools;

    It is proposed that this goal be addressed by developing the structure so that it can be released as an XML document, with an associated XML DTD.

  3. to retain backward compatibility with the original EDICT structure in order to enable legacy software systems to use later versions of the EDICT files.

Project Status

The following has been achieved to date (June 2003):

  1. a new structure has been developed for the EDICT file, which has been called JMdict (Japanese Multi-lingual Dictionary). This structure has been described in an XML Document Type Declaration (DTD), which may be viewed here. (Note: this DTD is not quite up-to-date. The latest DTD is incorporated into the distributed JMDict file.) Samples of some EDICT entries converted to XML in accordance with the DTD can be viewed here.

  2. the EDICT file has been converted into a new structure which is aligned with the XML DTD. While many of the EDICT entries converted simply and automatically, a significant number of entries were variants of each other which had to be identified and combined. (Note that while this structure is aligned with the XML DTD, the XML format is not being used internally at the moment.)

  3. utility software has been developed which converts the new file structure back to the (old) EDICT format. All updates to the EDICT file are now taking place via the new structure;

  4. utility software has also been developed which converts the JMdict file in the new (internal) structure into the XML format for release;

  5. sets of translational equivalents in other languages are added to the JMDict file when it is released. These are:

    1. entries from Ulrich Apel's WaDokuJT dictionary project.
    2. the French glosses from Jean-Marc Desperrier's translation of the EDICT file (Jean-Marc's page.)
    3. Oleg Volkov's EDICT-format Japanese-Russian dictionary file.

Feedback

Comments are sought from anyone interested in this project. In particular, critical appraisal of the proposed structure, and constructive suggestions for its improvement, will be most welcome. Please feel free to send me email about this project.

Release Date

The first release of the XML format JMdict (UTF8 Unicode) took place in May 1999. There have been several releases since then, with the most recent in October 2001. It is intended that JMdict releases take place at the same time as major EDICT releases.

Mailing List

There is a small closed mailing list for people seriously involved in JMdict. Email Jim if you want to be included.

Software

Some software is under development which uses JMdict:

The WWWJDIC dictionary server now uses an extended format for the main distionary entries, which draws from the JMdict files.

Permission for Use

The JMdict file is now located within the Electronic Dictionary Research and Development Group at Monash University. Information about the Group is here, including the terms under which the file can be used.

Jim Breen
June 2003