X-Authentication-Warning: delorie.com: mail set sender to geda-user-bounces using -f X-Recipient: geda-user AT delorie DOT com X-Original-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=/cdvo8BIhDu4LkRwkdNY9id4Yu8jgoRoDa+yRt7Bw0w=; b=rQX7WF0JzV1bstrYeV3ejdwug0RDj0g+NAfW6b+inLusnCh5EhQ4Bu0/8alhC/nrm6 r9PKEQL7Es3yjJr33wvvI3N6M6lxUm/nqUyIZU9l7JYy7Mzb9tmICNN0044yoqLtQNz1 578BlqzDGlp/BDz2MtJaRUJjuz/3G8YmLc3dtFFN1VIMh6pZc6Ubb0qNXm3sA/B+5lVd cElHfIxI+xF6pdxojPvu5AtLW2yi1v5rH6PPZmoVqKjAVz0+qNRVCQ5/aWVokjZpD5Pu mkSAo/vg7DXKHvy9CI2mz8QrDFxYEwxIUQ9r3SZGjhygl12OAFY0G1Upq4w7LpGchh7q BFTg== MIME-Version: 1.0 X-Received: by 10.60.65.68 with SMTP id v4mr2758698oes.84.1441369660405; Fri, 04 Sep 2015 05:27:40 -0700 (PDT) In-Reply-To: <20150904112133.85560809DB82@turkos.aspodata.se> References: <20150904095423 DOT 31827809DB80 AT turkos DOT aspodata DOT se> <20150904112133 DOT 85560809DB82 AT turkos DOT aspodata DOT se> Date: Fri, 4 Sep 2015 08:27:40 -0400 Message-ID: Subject: Re: [geda-user] Re: pdf table extraction From: "Jason White (whitewaterssoftwareinfo AT gmail DOT com) [via geda-user AT delorie DOT com]" To: geda-user AT delorie DOT com Content-Type: multipart/alternative; boundary=001a11c1cb68993c37051eeb0682 Reply-To: geda-user AT delorie DOT com Errors-To: nobody AT delorie DOT com X-Mailing-List: geda-user AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk --001a11c1cb68993c37051eeb0682 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable My absolute favorite for extracting data from tables in PDF datasheets is Tabula (http://tabula.technology/), it has a nice interface. On Fri, Sep 4, 2015 at 7:21 AM, wrote: > Igor2: > > On Fri, 4 Sep 2015, karl AT aspodata DOT se wrote: > > > Igor2: > > > [ about tables in pdf's ] > > > > > > It's true that pdf doesn't have a table structure. > > > > > > I have some experimetal code to extract tables from pdf's, the is in: > > > > > > http://turkos.aspodata.se/git/openhw/pdftosym/Experimental/ > > > > Thanx, will check it out. What you wrote suggests your script works > > similar to mine. > > Yes, but I got the impression you used the graphical elements in the > file and that you possible used pdftohtml in "html" mode, which doesn't > give you the text positions. I have been working purely on the textual > part. > > And beware that the code above is a big mess. Perhaps you can have a > look at: > > http://turkos.aspodata.se/computing/pdfextr.pl > > which is a little less unpolished, it extracts things from an invoice > (sorry can't provide you with the input data example). > > Regards, > /Karl Hammar > > ----------------------------------------------------------------------- > Asp=C3=B6 Data > Lilla Asp=C3=B6 148 > S-742 94 =C3=96sthammar > Sweden > +46 173 140 57 > > > --=20 Jason White --001a11c1cb68993c37051eeb0682 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
My absolute favorite for extracting data from tables in PD= F datasheets is Tabula (http://tabula= .technology/), it has a nice interface.

On Fri, Sep 4, 2015 at 7:21 AM, <karl@= aspodata.se> wrote:
Igor2:<= br> > On Fri, 4 Sep 2015, karl AT aspodata.= se wrote:
> > Igor2:
> > [ about tables in pdf's ]
> >
> > It's true that pdf doesn't have a table structure.
> >
> > I have some experimetal code to extract tables from pdf's, th= e is in:
> >
> >=C2=A0 http://turkos.aspodata.se= /git/openhw/pdftosym/Experimental/
>
> Thanx, will check it out. What you wrote suggests your script works > similar to mine.

Yes, but I got the impression you used the graphical elements in the
file and that you possible used pdftohtml in "html" mode, which d= oesn't
give you the text positions. I have been working purely on the textual
part.

And beware that the code above is a big mess. Perhaps you can have a
look at:

=C2=A0http://turkos.aspodata.se/computing/pdfextr.pl

which is a little less unpolished, it extracts things from an invoice
(sorry can't provide you with the input data example).

Regards,
/Karl Hammar

-----------------------------------------------------------------------
Asp=C3=B6 Data
Lilla Asp=C3=B6 148
S-742 94 =C3=96sthammar
Sweden
+46 173 140 57=





--
Jason White
--001a11c1cb68993c37051eeb0682--