X-Authentication-Warning: delorie.com: mail set sender to geda-user-bounces using -f X-Recipient: geda-user AT delorie DOT com X-Original-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=ew5RWG+h2xEhgGepxvLiRJCiwMZrBmL0a+MxaDSo1+M=; b=dLhtJcStqdzUbFWXexlbsbpQBAThZKuXxXDKsJxCwZslNbblnwYhHKmMXCrf86fbI+ 3sqKPq1FohXbV/5eBKqufPZtd6kJknP+a+DpTFW2j8aEX3Xt0kR610ZglB41jqKpEDI/ rbTQjPdrpKFlAMvf5/TOvWHH7+pQqbp5/hMvO06rs58g2BAnjdLX9MPFa9gVrtInMvxU 1s/s5supXFqBCc4kPKp8YjVqoobKp8eUZkKs3pyFM6/ugEwo7U1R1W6ic1RGiEdJUgIf 6OxJ7iE0tCFQNvmC4G4s5fyW3srLlo6lvRGWj1nTJN8/VEYvfF7DkXoa+NDK6kg0/yNS b/mg== MIME-Version: 1.0 X-Received: by 10.50.3.37 with SMTP id 5mr3327889igz.96.1441343338918; Thu, 03 Sep 2015 22:08:58 -0700 (PDT) In-Reply-To: References: Date: Thu, 3 Sep 2015 22:08:58 -0700 Message-ID: Subject: Re: [geda-user] Interesting blog post from a commercial EDA vendor - pdf From: "Ouabache Designworks (z3qmtr45 AT gmail DOT com) [via geda-user AT delorie DOT com]" To: geda-user AT delorie DOT com Content-Type: multipart/alternative; boundary=089e013c6e9ab74219051ee4e5f9 Reply-To: geda-user AT delorie DOT com Errors-To: nobody AT delorie DOT com X-Mailing-List: geda-user AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk --089e013c6e9ab74219051ee4e5f9 Content-Type: text/plain; charset=UTF-8 On Thu, Sep 3, 2015 at 9:00 PM, wrote: > > > On Thu, 3 Sep 2015, Ouabache Designworks (z3qmtr45 AT gmail DOT com) [via > geda-user AT delorie DOT com] wrote: > > >> >> https://medium.com/@zakhomuth/disrupting-electronic-design-automation-8988f >> 72299e3 >> > > Btw, somewhat off-topic, the part not covered by geda-user discussions > usually: pdf datasheets. I really like his rant on how useless distributing > data in pdf is. > > I face that problem from time to time. Last december I had it with an arm > cortex. I wanted to extract the register names, bit names and magic values > (e.g. this bit in this register always has to be 1). C source and other > stuff comes with an EULA that doesn't let me do what I want. Datasheet is > in pdf. Most of the relevant data are in almost uniform tables. > > I thought I'd just convert the pdf to html and extract nodes... I > laugh at this idea in retrospect. I tried with various tools and various > settings. Never got a
. Turned out the pdf just draws the borders > and draws the text separately. The render looks like if it was a table. The > html some tools produce look the same as the pdf. In practice, it's not a > table in those htmls, just a big background bitmap with the lines and the > text printed onto it at pixel coords. > > I ended up with a "table mapping" script that takes the bitmap, scans > lines and columns to map cell coordinates then reads all the text from the > html and determine which cell they are in. > > And this is only the first step to convert the data of a datasheet to a > machine readable form on the lowest level... Upper levels in separate > scripts took the table map and tried to read the header and convert the > info into a register description. > > I agree with the upverter guy. In the age of thousand page datasheets, > non-machine-readable format is a bug that needs to be fixed. On the other > hand I'm highly sceptic about vendors being cooperative on this. > > Regards, > > Igor2 > In the old days I would keep a printed copy of all the IC's that I was working on in a binder on my shelves. But as chips grew that became impossible. A single chip today could easiy take up hundreds of feet of shelf space and searching it is impossible. Upverter is a commercial vendor so I understand that they do have to make a buck but Zak does bring up an interesting point. It is not open source vs commercial that we are dealing with. It is Big EDA vs everybody. We have to start talking with each other and come up with usable standards that do not lock us into big eda tools. John Eaton --089e013c6e9ab74219051ee4e5f9 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Thu, Sep 3, 2015 at 9:00 PM, <gedau AT igor2 DOT repo DOT hu>= wrote:


On Thu, 3 Sep 2015, Ouabache Designworks (z3qmtr45 AT gmail DOT com) [via geda-user AT delorie DOT com] wrote:


https://medium.com/@zakho= muth/disrupting-electronic-design-automation-8988f
72299e3

Btw, somewhat off-topic, the part not covered by geda-user discussions usua= lly: pdf datasheets. I really like his rant on how useless distributing dat= a in pdf is.

I face that problem from time to time. Last december I had it with an arm c= ortex. I wanted to extract the register names, bit names and magic values (= e.g. this bit in this register always has to be 1). C source and other stuf= f comes with an EULA that doesn't let me do what I want. Datasheet is i= n pdf. Most of the relevant data are in almost uniform tables.

I thought I'd just convert the pdf to html and extract <table> no= des... I laugh at this idea in retrospect. I tried with various tools and v= arious settings. Never got a <table>. Turned out the pdf just draws t= he borders and draws the text separately. The render looks like if it was a= table. The html some tools produce look the same as the pdf. In practice, = it's not a table in those htmls, just a big background bitmap with the = lines and the text printed onto it at pixel coords.

I ended up with a "table mapping" script that takes the bitmap, s= cans lines and columns to map cell coordinates then reads all the text from= the html and determine which cell they are in.

And this is only the first step to convert the data of a datasheet to a mac= hine readable form on the lowest level... Upper levels in separate scripts = took the table map and tried to read the header and convert the info into a= register description.

I agree with the upverter guy. In the age of thousand page datasheets, non-= machine-readable format is a bug that needs to be fixed. On the other hand = I'm highly sceptic about vendors being cooperative on this.

Regards,

Igor2

In the old days I would keep a pr= inted copy of all the IC's that I was working on in a binder on my shel= ves. But as chips grew that became impossible. A single chip today could ea= siy take up hundreds of feet=C2=A0 of shelf space
and searchi= ng it is impossible. Upverter is a commercial vendor so I understand that t= hey do have to make a buck but Zak does bring up an interesting point. It i= s not open source vs commercial that we are dealing with. It is
Big EDA vs everybody. We have to start talking with each other and come = up with usable standards that do not lock us into big eda tools.

John Eaton



--089e013c6e9ab74219051ee4e5f9--