delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2018/08/09/05:19:35

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:subject:to:references:from:message-id:date
:mime-version:in-reply-to:content-type
:content-transfer-encoding; q=dns; s=default; b=UZoPiF3A5g1vG/Zl
KM8wxYzRo/zWfppWxEAlUTGCvjLidE36y3A099+wtE25UKULMG9WJ6FSOTkZJAWe
X2K57WHH8FJFyYDCkvevlKF9NlWDcpablxX7XCivgPD6r8ePxxfkM8Xa9exsko2l
k0DkElrhueLfk7X457cxHSIsWdw=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:subject:to:references:from:message-id:date
:mime-version:in-reply-to:content-type
:content-transfer-encoding; s=default; bh=hxuEABDtqisdLfDnLLJ0aY
RMwH8=; b=LAu8KFfkgFfwTPVlOzEXwkSBVJJlABAARvQV8KjcNfmyRH6JpqnQDu
DLuCqhIY46Vh3mIL/6qOVC8Fyskoi2j5T+1Q43zfGtLY1uLb4YPDnEVr+Il4KoWc
fBWk74JZX6jbDaIKcNplMstwGhkzXBU59XRrizpjjCwu25v6+NKFs=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=0.5 required=5.0 tests=BAYES_40,KAM_NUMSUBJECT,SPF_PASS autolearn=no version=3.3.2 spammy=Spanish, spanish, German, neural
X-HELO: v2201612906741603.powersrv.de
Subject: Re: [ANNOUNCEMENT] Test: tesseract-ocr-4.0.0-0.4
To: cygwin AT cygwin DOT com
References: <announce DOT 16d441c3-aa21-cdc5-949d-966a4acdb94d AT gmail DOT com> <625b4126-96cc-cd4e-c309-b0f9a1eb4895 AT weilnetz DOT de> <13b7b33e-e62f-0ce9-e313-0c0fa73051fe AT gmail DOT com>
From: Stefan Weil <sw AT weilnetz DOT de>
Message-ID: <dc956285-c28f-478c-999d-44a990fde238@weilnetz.de>
Date: Thu, 9 Aug 2018 11:19:13 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1
MIME-Version: 1.0
In-Reply-To: <13b7b33e-e62f-0ce9-e313-0c0fa73051fe@gmail.com>
X-IsSubscribed: yes

Am 09.08.2018 um 10:19 schrieb Marco Atzeri:
> My understanding is that the trained data "tessdata, tessdata_fast,
> tessdata_best" are coming from the same training data then version 3
> 
> https://github.com/tesseract-ocr/langdata
> 
> It is not that the languages raw data should be changed.
> 
> Regards
> Marco

https://github.com/tesseract-ocr/langdata is valid for Tesseract 3.05.x
and earlier versions.

Tesseract 4.0.0 still supports the old traineddata format, but added new
(and typically better) traineddata based on neural networks. There is
currently no langdata available for those new traineddata.

tessdata_best only contains the new traineddata.

tessdata_fast also contains only new traineddata, but is faster and less
accurate.

tessdata still contains old traineddata for most languages and
additionally new traineddata made from tessdata_best, but using integer
instead of float models (which makes them faster).

tessdata_best, tessdata_fast and tessdata not only contain traineddata
for many languages, but also for "scripts", for example in
https://github.com/tesseract-ocr/tessdata/tree/master/script. Those
models support all languages using the same script, so
https://github.com/tesseract-ocr/tessdata/blob/master/script/Latin.traineddata
supports all languages which use Latin characters (English, French,
Spanish, Italian, German, Danish, ...). A selection of those script
models would be useful for Cygwin, too.

Regards,
Stefan

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019