delorie.com/archives/browse.cgi | search |
X-Recipient: | archive-cygwin AT delorie DOT com |
DomainKey-Signature: | a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id |
:list-unsubscribe:list-subscribe:list-archive:list-post | |
:list-help:sender:subject:to:references:from:message-id:date | |
:mime-version:in-reply-to:content-type | |
:content-transfer-encoding; q=dns; s=default; b=UZoPiF3A5g1vG/Zl | |
KM8wxYzRo/zWfppWxEAlUTGCvjLidE36y3A099+wtE25UKULMG9WJ6FSOTkZJAWe | |
X2K57WHH8FJFyYDCkvevlKF9NlWDcpablxX7XCivgPD6r8ePxxfkM8Xa9exsko2l | |
k0DkElrhueLfk7X457cxHSIsWdw= | |
DKIM-Signature: | v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id |
:list-unsubscribe:list-subscribe:list-archive:list-post | |
:list-help:sender:subject:to:references:from:message-id:date | |
:mime-version:in-reply-to:content-type | |
:content-transfer-encoding; s=default; bh=hxuEABDtqisdLfDnLLJ0aY | |
RMwH8=; b=LAu8KFfkgFfwTPVlOzEXwkSBVJJlABAARvQV8KjcNfmyRH6JpqnQDu | |
DLuCqhIY46Vh3mIL/6qOVC8Fyskoi2j5T+1Q43zfGtLY1uLb4YPDnEVr+Il4KoWc | |
fBWk74JZX6jbDaIKcNplMstwGhkzXBU59XRrizpjjCwu25v6+NKFs= | |
Mailing-List: | contact cygwin-help AT cygwin DOT com; run by ezmlm |
List-Id: | <cygwin.cygwin.com> |
List-Subscribe: | <mailto:cygwin-subscribe AT cygwin DOT com> |
List-Archive: | <http://sourceware.org/ml/cygwin/> |
List-Post: | <mailto:cygwin AT cygwin DOT com> |
List-Help: | <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs> |
Sender: | cygwin-owner AT cygwin DOT com |
Mail-Followup-To: | cygwin AT cygwin DOT com |
Delivered-To: | mailing list cygwin AT cygwin DOT com |
Authentication-Results: | sourceware.org; auth=none |
X-Spam-SWARE-Status: | No, score=0.5 required=5.0 tests=BAYES_40,KAM_NUMSUBJECT,SPF_PASS autolearn=no version=3.3.2 spammy=Spanish, spanish, German, neural |
X-HELO: | v2201612906741603.powersrv.de |
Subject: | Re: [ANNOUNCEMENT] Test: tesseract-ocr-4.0.0-0.4 |
To: | cygwin AT cygwin DOT com |
References: | <announce DOT 16d441c3-aa21-cdc5-949d-966a4acdb94d AT gmail DOT com> <625b4126-96cc-cd4e-c309-b0f9a1eb4895 AT weilnetz DOT de> <13b7b33e-e62f-0ce9-e313-0c0fa73051fe AT gmail DOT com> |
From: | Stefan Weil <sw AT weilnetz DOT de> |
Message-ID: | <dc956285-c28f-478c-999d-44a990fde238@weilnetz.de> |
Date: | Thu, 9 Aug 2018 11:19:13 +0200 |
User-Agent: | Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 |
MIME-Version: | 1.0 |
In-Reply-To: | <13b7b33e-e62f-0ce9-e313-0c0fa73051fe@gmail.com> |
X-IsSubscribed: | yes |
Am 09.08.2018 um 10:19 schrieb Marco Atzeri: > My understanding is that the trained data "tessdata, tessdata_fast, > tessdata_best" are coming from the same training data then version 3 > > https://github.com/tesseract-ocr/langdata > > It is not that the languages raw data should be changed. > > Regards > Marco https://github.com/tesseract-ocr/langdata is valid for Tesseract 3.05.x and earlier versions. Tesseract 4.0.0 still supports the old traineddata format, but added new (and typically better) traineddata based on neural networks. There is currently no langdata available for those new traineddata. tessdata_best only contains the new traineddata. tessdata_fast also contains only new traineddata, but is faster and less accurate. tessdata still contains old traineddata for most languages and additionally new traineddata made from tessdata_best, but using integer instead of float models (which makes them faster). tessdata_best, tessdata_fast and tessdata not only contain traineddata for many languages, but also for "scripts", for example in https://github.com/tesseract-ocr/tessdata/tree/master/script. Those models support all languages using the same script, so https://github.com/tesseract-ocr/tessdata/blob/master/script/Latin.traineddata supports all languages which use Latin characters (English, French, Spanish, Italian, German, Danish, ...). A selection of those script models would be useful for Cygwin, too. Regards, Stefan -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
webmaster | delorie software privacy |
Copyright © 2019 by DJ Delorie | Updated Jul 2019 |