Japanese Language Text Mining Workshop

University of Chicago, July 22 & 30, 2021

Paula R. Curtis, Hoyt Long, Mark Ravina

Welcome! Below you will find handout resources for our workshop, Japanese Language Text Mining. Please note that these materials are only accessible to participants signed up for the course and therefore the links below will not work for anyone else. Participants should not circulate them in any form beyond our workshop. We hope to make them open access in the future. We appreciate your understanding!

Course Syllabus

Resource Handouts

Session 1: Introduction to OCR
Session 1: Demos: ABBYY, KuroNet, & Tesseract

Session 2: Introduction to RegEx, Text Editors, & Corpus Prep
Session 2: Demo: Open Refine; Intro to Metadata Structuring & Organization
Session 2: Demo: Tesseract: An integrated workflow with analysis