X-Authentication-Warning: delorie.com: mail set sender to geda-user-bounces using -f X-Recipient: geda-user AT delorie DOT com Date: Fri, 4 Sep 2015 06:00:42 +0200 (CEST) X-X-Sender: igor2 AT igor2priv To: "Ouabache Designworks (z3qmtr45 AT gmail DOT com) [via geda-user AT delorie DOT com]" X-Debug: to=geda-user AT delorie DOT com from="gedau AT igor2 DOT repo DOT hu" From: gedau AT igor2 DOT repo DOT hu Subject: Re: [geda-user] Interesting blog post from a commercial EDA vendor - pdf In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (DEB 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Reply-To: geda-user AT delorie DOT com Errors-To: nobody AT delorie DOT com X-Mailing-List: geda-user AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk On Thu, 3 Sep 2015, Ouabache Designworks (z3qmtr45 AT gmail DOT com) [via geda-user AT delorie DOT com] wrote: > >https://medium.com/@zakhomuth/disrupting-electronic-design-automation-8988f >72299e3 Btw, somewhat off-topic, the part not covered by geda-user discussions usually: pdf datasheets. I really like his rant on how useless distributing data in pdf is. I face that problem from time to time. Last december I had it with an arm cortex. I wanted to extract the register names, bit names and magic values (e.g. this bit in this register always has to be 1). C source and other stuff comes with an EULA that doesn't let me do what I want. Datasheet is in pdf. Most of the relevant data are in almost uniform tables. I thought I'd just convert the pdf to html and extract nodes... I laugh at this idea in retrospect. I tried with various tools and various settings. Never got a
. Turned out the pdf just draws the borders and draws the text separately. The render looks like if it was a table. The html some tools produce look the same as the pdf. In practice, it's not a table in those htmls, just a big background bitmap with the lines and the text printed onto it at pixel coords. I ended up with a "table mapping" script that takes the bitmap, scans lines and columns to map cell coordinates then reads all the text from the html and determine which cell they are in. And this is only the first step to convert the data of a datasheet to a machine readable form on the lowest level... Upper levels in separate scripts took the table map and tried to read the header and convert the info into a register description. I agree with the upverter guy. In the age of thousand page datasheets, non-machine-readable format is a bug that needs to be fixed. On the other hand I'm highly sceptic about vendors being cooperative on this. Regards, Igor2