X-Authentication-Warning: delorie.com: mail set sender to geda-user-bounces using -f X-Recipient: geda-user AT delorie DOT com X-Original-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=uqRxrRNfiHTvMGQ2vw4YaqD5MW4j7rQbxx2snZ+Jnyc=; b=DA2yQmFkNDNDIffqvDhLfaaZTQS0ISvf0wc5BZpIJVr8ttDbxgmoRzMzN0fP4TWuP5 0n1dCRRbUPXITM96NH7yuQd0bwPLaAbK38Apm8VQoVkZDwgHhJ38nt9vDb/F3CD5EdlR 2tDcwRq0cJ9MBpDuPWKnmY6/2+3c5FP397XhndSsxLca3gLIRrvBgsLaSgv/lYhVHXAa R9r5Te+qXyoVD9MspEwVQNhoHTgrsx7i3+DFn3LveTwzn16CAOFiXv6syx22lQF1jLRq KwF8YoqOCZ7THGkYKrxT1HbobMb4Ev0MtUfXflkP0Bj1HG5v//61WzWZkKfsIUYSeJrl AS8A== MIME-Version: 1.0 X-Received: by 10.60.36.202 with SMTP id s10mr4003761oej.0.1441379847213; Fri, 04 Sep 2015 08:17:27 -0700 (PDT) In-Reply-To: <20150904145747.7ADC0809DB89@turkos.aspodata.se> References: <20150904095423 DOT 31827809DB80 AT turkos DOT aspodata DOT se> <20150904112133 DOT 85560809DB82 AT turkos DOT aspodata DOT se> <20150904145747 DOT 7ADC0809DB89 AT turkos DOT aspodata DOT se> Date: Fri, 4 Sep 2015 11:17:27 -0400 Message-ID: Subject: Re: [geda-user] Re: pdf table extraction From: "Jason White (whitewaterssoftwareinfo AT gmail DOT com) [via geda-user AT delorie DOT com]" To: geda-user AT delorie DOT com Content-Type: multipart/alternative; boundary=089e013a14eac79951051eed6502 Reply-To: geda-user AT delorie DOT com Errors-To: nobody AT delorie DOT com X-Mailing-List: geda-user AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk --089e013a14eac79951051eed6502 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable It works fine with open source Java, just follow the installation directions... Regarding you second point, thankfully I have never needed to look at the source code with this particular tool. If the core is not available try contacting the author, maybe they have a reason? (ie. licensing issues or something) On Fri, Sep 4, 2015 at 10:57 AM, wrote: > Jason: > > My absolute favorite for extracting data from tables in PDF datasheets = is > > Tabula (http://tabula.technology/), it has a nice interface. > > Seems to some mix of ruby, java and javascript, and you run it through > your browser. > > Last time I checked there were no 64bit java, seems there is now: > > https://www.java.com/en/download/manual.jsp > > but it's 68MB for java and 53 for the source, a little big. > Does it work with any free java implementations or does it require > the latest sun/oracle one ? > > Also, in the repository, there are jar files and no java source. > The pdf thing seems to be done by the java code which is binary. > > So it's hard to get ideas from their code and to contribute. > > /// > > Looking at > https://source.opennews.org/en-US/articles/introducing-tabula/ > > they use the same intermediary format (except they also have the > rotation parameter present) -- or used, since the mentioned ruby > script is not any longer present. > > But they at least points to > http://www.tamirhassan.com/index.html#Publications > > which points to theese possible usable articles: > http://www.orsigiorgio.net/wp-content/papercite-data/pdf/gho*12.pdf > http://www.dbai.tuwien.ac.at/staff/hassan/files/p47-hassan.pdf > http://www.cvc.uab.es/icdar2009/papers/3725a631.pdf > http://rewerse.net/publications/download/REWERSE-RP-2006-085.pdf > > Regards, > /Karl Hammar > > ----------------------------------------------------------------------- > Asp=C3=B6 Data > Lilla Asp=C3=B6 148 > S-742 94 =C3=96sthammar > Sweden > +46 173 140 57 > > > --=20 Jason White --089e013a14eac79951051eed6502 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
It works fine with open source Java, just follow the = installation directions...

Regarding you second point, th= ankfully I have never needed to look at the source code with this particula= r tool. If the core is not available try contacting the author, maybe they = have a reason? (ie. licensing issues or something)

On Fri, Sep 4, 2015 at 10:= 57 AM, <karl AT aspodata DOT se> wrote:
Jason:
> My absolute favorite for extracting data from tables = in PDF datasheets is
> Tabula (http://tabula.technology/), it has a nice interface.

Seems to some mix of ruby, java and javascript, and you run it throu= gh
your browser.

Last time I checked there were no 64bit java, seems there is now:

=C2=A0https://www.java.com/en/download/manual.jsp

but it's 68MB for java and 53 for the source, a little big.
Does it work with any free java implementations or does it require
the latest sun/oracle one ?

Also, in the repository, there are jar files and no java source.
The pdf thing seems to be done by the java code which is binary.

So it's hard to get ideas from their code and to contribute.

///

=C2=A0Looking at
https://source.opennews.org/en-US/arti= cles/introducing-tabula/

they use the same intermediary format (except they also have the
rotation parameter present) -- or used, since the mentioned ruby
script is not any longer present.

But they at least points to
=C2=A0http://www.tamirhassan.com/index.html#Public= ations

which points to theese possible usable articles:
=C2=A0http://www.orsigiorgio.net/= wp-content/papercite-data/pdf/gho*12.pdf
=C2=A0http://www.dbai.tuwien.ac.at/sta= ff/hassan/files/p47-hassan.pdf
=C2=A0http://www.cvc.uab.es/icdar2009/papers/37= 25a631.pdf
=C2=A0http://rewerse.net/publication= s/download/REWERSE-RP-2006-085.pdf

Regards,
/Karl Hammar

-----------------------------------------------------------------------
Asp=C3=B6 Data
Lilla Asp=C3=B6 148
S-742 94 =C3=96sthammar
Sweden
+46 173 140 57=





--
Jason White
--089e013a14eac79951051eed6502--