X-Authentication-Warning: delorie.com: mail set sender to geda-user-bounces using -f X-Recipient: geda-user AT delorie DOT com X-Mailer: exmh version 2.8.0 04/21/2012 (debian 1:2.8.0~rc1-2) with nmh-1.5 X-Exmh-Isig-CompType: repl X-Exmh-Isig-Folder: inbox From: karl AT aspodata DOT se To: geda-user AT delorie DOT com Subject: Re: [geda-user] Re: pdf table extraction In-reply-to: References: <20150904095423 DOT 31827809DB80 AT turkos DOT aspodata DOT se> <20150904112133 DOT 85560809DB82 AT turkos DOT aspodata DOT se> Comments: In-reply-to "Jason White (whitewaterssoftwareinfo AT gmail DOT com) [via geda-user AT delorie DOT com]" message dated "Fri, 04 Sep 2015 08:27:40 -0400." Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Message-Id: <20150904145747.7ADC0809DB89@turkos.aspodata.se> Date: Fri, 4 Sep 2015 16:57:47 +0200 (CEST) X-Virus-Scanned: ClamAV using ClamSMTP Reply-To: geda-user AT delorie DOT com Errors-To: nobody AT delorie DOT com X-Mailing-List: geda-user AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk Jason: > My absolute favorite for extracting data from tables in PDF datasheets is > Tabula (http://tabula.technology/), it has a nice interface. Seems to some mix of ruby, java and javascript, and you run it through your browser. Last time I checked there were no 64bit java, seems there is now: https://www.java.com/en/download/manual.jsp but it's 68MB for java and 53 for the source, a little big. Does it work with any free java implementations or does it require the latest sun/oracle one ? Also, in the repository, there are jar files and no java source. The pdf thing seems to be done by the java code which is binary. So it's hard to get ideas from their code and to contribute. /// Looking at https://source.opennews.org/en-US/articles/introducing-tabula/ they use the same intermediary format (except they also have the rotation parameter present) -- or used, since the mentioned ruby script is not any longer present. But they at least points to http://www.tamirhassan.com/index.html#Publications which points to theese possible usable articles: http://www.orsigiorgio.net/wp-content/papercite-data/pdf/gho*12.pdf http://www.dbai.tuwien.ac.at/staff/hassan/files/p47-hassan.pdf http://www.cvc.uab.es/icdar2009/papers/3725a631.pdf http://rewerse.net/publications/download/REWERSE-RP-2006-085.pdf Regards, /Karl Hammar ----------------------------------------------------------------------- Aspö Data Lilla Aspö 148 S-742 94 Östhammar Sweden +46 173 140 57