X-Authentication-Warning: delorie.com: mail set sender to geda-user-bounces using -f X-Recipient: geda-user AT delorie DOT com X-TCPREMOTEIP: 207.224.51.38 X-Authenticated-UID: jpd AT noqsi DOT com Content-Type: multipart/signed; boundary="Apple-Mail=_485C7A71-49FC-4AEF-8D2E-E7A4B212F0DE"; protocol="application/pgp-signature"; micalg=pgp-sha512 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: [geda-user] A fileformat library X-Pgp-Agent: GPGMail 2.5.2 From: John Doty In-Reply-To: Date: Sun, 3 Jan 2016 07:48:52 -0700 Message-Id: References: <1512221837 DOT AA25291 AT ivan DOT Harhan DOT ORG> <20151222232230 DOT 12633 DOT qmail AT stuge DOT se> <0F6F1D0F-4F07-48EA-90FE-836EAD4E2354 AT noqsi DOT com> <0FCF3774-F93C-4BFF-BB61-636F75DCCACB AT noqsi DOT com> To: geda-user AT delorie DOT com X-Mailer: Apple Mail (2.1878.6) Reply-To: geda-user AT delorie DOT com Errors-To: nobody AT delorie DOT com X-Mailing-List: geda-user AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk --Apple-Mail=_485C7A71-49FC-4AEF-8D2E-E7A4B212F0DE Content-Type: multipart/alternative; boundary="Apple-Mail=_4FFE1E41-CC5C-4240-B661-6AB5A5FFE42F" --Apple-Mail=_4FFE1E41-CC5C-4240-B661-6AB5A5FFE42F Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 Was the intended subject the *pcb* file format? On Jan 3, 2016, at 12:53 AM, Britton Kerin (britton DOT kerin AT gmail DOT com) = [via geda-user AT delorie DOT com] wrote: >=20 >=20 > On Sat, Jan 2, 2016 at 8:19 PM, John Doty wrote: >=20 > On Jan 2, 2016, at 9:27 PM, Britton Kerin (britton DOT kerin AT gmail DOT com) = [via geda-user AT delorie DOT com] wrote: >=20 >>=20 >>=20 >> On Sat, Jan 2, 2016 at 6:07 PM, John Doty wrote: >>=20 >> On Jan 2, 2016, at 7:47 PM, Britton Kerin (britton DOT kerin AT gmail DOT com) = [via geda-user AT delorie DOT com] wrote: >>=20 >>>=20 >>>=20 >>> On Sat, Jan 2, 2016 at 4:38 PM, John Doty wrote: >>>=20 >>> On Jan 2, 2016, at 6:07 PM, Britton Kerin (britton DOT kerin AT gmail DOT com) = [via geda-user AT delorie DOT com] wrote: >>>=20 >>>> Personally I find formats like this: >>>>=20 >>>> device=3DRESISTOR >>>> T 44400 49300 5 10 1 1 90 0 1 So, the subject the .sch file format? >>>>=20 >>>> substantially less readable than ones with field names, but they = are indeed easy to parse. >>>=20 >>> Personally, I rarely edit these things manually except for the text = fields, which are not difficult to find. The fact that they=92re easy to = parse is handy for automation. >>>=20 >>>> The pcb format is quite a bit more elaborate and the savings from = not rolling your own parser are more significant. >>>>=20 >>>> I think you're criteria for what should go in libgeda are spot-on = btw. Nor do I have any problem with a C interface calling python or = gschem or for that matter C++. I do think providing a clean C interface = to libgeda gets by far the best return on investment, since it's so = widely known and with a little care wrappers can then be provided almost = automatically for a wide variety of languages (via SWIG or some other = similar mechanism -- or maybe Xorn facilitates this, I'm a little = unclear). >>>=20 >>> I don=92t find deconstructing C data structures particularly easier = than parsing the format above. Just another layer I have to penetrate to = get to the data. I do significant processing with simple things like = sed, which don=92t handle binary data. >>>=20 >>> Wrappers CAN be provided, but will they? FFI programming is not the = easiest thing. I hear complaints about the need for developers to = maintain code. It seems to me that one way to address these concerns is = to avoid and eliminate unnecessary code. >>>=20 >>> Good question. It's a great result if you get it but a lot more = work than using a serialization library, which is why the latter = approach seems to me like a useful step in the right direction. >> Serialization library? Why do you want a extra, unnecessary, opaque = interface? What, exactly, are you trying to accomplish? >>=20 >> Two things: >>=20 >> 1. A human- and partial-parser-script-readable format >=20 > We have that, I think. But you left out the most important virtue: = *simple*. >=20 > I agree that it's readable enough, though it could be better. I also = agree that simplicity is good. >=20 >> 2. Full parsers for as many languages as possible without = writing them by hand >=20 > So instead, you need to write an interface between a complicated = parser and every application by hand. Where=92s the gain? >=20 > Here's what YAML looks like from perl: >=20 > use YAML::XS; >=20 > my $yaml =3D Dump [ 1..4 ]; > my $array =3D Load $yaml; But you left out the next step: you have to deconstruct whatever it = built to do anything with it. To do that, you have to understand the = construction. While if I simply read the file (the format is too trivial = for a reader to deserve the name =93parser"), I go directly to *my* = application=92s model of the underlying data, on a trajectory that = matches what I need. Your example seems to build some sort of complex = data structure. What if a line by line data-driven approach is more = natural? >=20 > The gain is that this is a vastly easier way to vivify a saved object = that to write my own parser, I disagree. If the format were more complicated, you=92d have a point, = but it=92s not. > or even my own partial parser for non-trivial cases. >> Now take a look at the design goals for YAML: >>=20 >> http://www.yaml.org/spec/1.2/spec.html#id2708649 >>=20 >> It's a good fit. If it was only a matter of the technical merits I = would say as close to perfect as it gets with software. >=20 > Compare it to http://wiki.geda-project.org/geda:file_format_spec > YAML is enormously more complex to no advantage for us. >=20 > The point is that you don't have to deal with any of that complexity It becomes a dependency for *every* tool. It will break (they always = do). It won=92t work with *every* language on every OS. > (of which there really isn't all that much -- calling it enormously = complex is a big overstatement). It's a library with approximately two = entry points per language for modern languages, and not much more for C. AWK? > Parsing may be a non-issue for you if you only care about strings in = .sch files, but for many useful operations on pcbs you need the whole = thing, or most of it. Pcb may need it, but that=92s a completely different issue. We=92re = talking about .sch files. Can we *please* separate the projects so that we don=92t keep going = through this kind of thing? >> Unfortunately there's the usual good-versus-most-popular trade-off in = deciding between YAML and JSON. I still favor YAML in this case, = largely because I can't look at people like you and honestly claim that = JSON is in all respects fun to read/edit/sed over etc., and because my = personal experience with JSON is that although the parsers are truly = ubiquitous they have some annoying characteristics (at least the Perl = one does). > But since it doesn=92t relieve the need of the application programmer = to understand the interface, it is merely adding more code for no gain = (or even >=20 > I'm not sure what you mean by this. The programmer needs to = understand what the fields mean, sure. YAML/JSON helps somewhat with = this, because the fields have names. Even if you do understand the = existing format, that understanding that will absolutely not get you a = live editable version of what's in a pcb file without a lot of = (pointless) additional work. We=92re not talking about pcb. What you showed above is .sch. >=20 > negative gain, given the added complexity). And neither YAML nor JSON = is as universally readable and processable as the format we have. >=20 > There's no added complexity to speak of for clients, and YAML is far = more readable and at least as processable as what we have now. I think = your view of things is strongly tied to your particular use case. Cases. Many cases. > It sounds like you mostly work on attributes with their own special = meaning (IIRC noqsi has attributes with their own syntax), Perhaps you mean gnet-spice-noqsi? All of the simulation flows have = special attributes beyond what pcb uses. Some PCB layout flows do too = (do you know why some library symbols have pins=3D and class=3D?). > and don't have to parse everything. That's fine. I sure don't want = to break anything for you. It sounds like you do. It sounds like you=92re volunteering to rewrite = everybody=92s custom scripts for things like symbol generation and = refdes renumbering. >=20 > However, if you consider the actual problem I'm hoping to address you = might sympathize at least with the thought that not reinventing the = parser everywhere might be worthwhile. =93Reinvent=94 and =93parser=94 imply a difficulty that we don=92t have. = Is it really that hard to compose a few strings like: "T %d %d %d %d %d %d %d %d %d=94 Or just pick out a field you need with something like (data-driven AWK): /^T /{ numlines =3D $10 /* do something with the text lines */ } > I started out to write a quick parser in perl, Why did you do that? Why didn=92t you just start out to write tool in = Perl, with reading happening as the tool needs it? If you approach the = problem assuming it=92s hard, you make it hard. > in exactly the way you seem to be proposing should be the way to do = everything. You didn=92t understand. > It's a significant hassle and you end up with a slow parser that = only works from one language. As you've pointed out yourself, parsing = (and serialization) is a relatively trivial, thoroughly solved problem. = Why reinvent the solution? 1. The project has a significant investment in the other approach = (tragesym, refdes_renum). 2. Users have an enormous investment: there are lots of custom scripts = that various people have mooted here over the years, and I=92d guess = there are many more private ones. 3. You=92re using heavy words: =93parser=94, =93reinvent=94, etc. for a = lightweight job. 4. The job of decoding the output of a universal parser isn=92t much, if = at all, simpler than just reading the file if the file encoding is = simple. 5. Serialization is appropriate when you start from well-defined common = binary data structures. We don=92t have that. We have a well-defined = common text format. > I've taken some time over this because at least one other person = indicated that they shared your concern about using a generic parser = rather than an arbitrary custom format. So I'd like to actually = convince you, lest you convince others that doing as I propose is a bad = idea for pcb. It might not be a bad idea for pcb. > I'd also like to apologize for bad attitude and rudeness I've shown = you in the past, and hope you're able to view this issue in technical = terms alone (I confess that I sometimes have difficulty doing this with = your emails). No apology needed. I=92m a scientist, I=92m used to this. I recall the = time I was publicly accused of =93witchcraft=94 for digging out a result = using an unfamiliar statistical method. The accuser then went back to = her lab, took another look at her own data, and proved I was right. We = never saw a dispute as personal. Scientific research is like this: we = have titanic arguments with our friends. That=92s how we bring all of = the facts and ideas to light and get the science right. >=20 > Britton >=20 John Doty Noqsi Aerospace, Ltd. http://www.noqsi.com/ jpd AT noqsi DOT com --Apple-Mail=_4FFE1E41-CC5C-4240-B661-6AB5A5FFE42F Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252 Was = the intended subject the *pcb* file = format?

On Jan 3, 2016, at 12:53 AM, Britton = Kerin (britton DOT kerin AT gmail DOT com) = [via geda-user AT delorie DOT com] = <geda-user AT delorie DOT com>= wrote:



On Sat, Jan 2, 2016 at 8:19 PM, John Doty <jpd AT noqsi DOT com> wrote:

On Jan 2, = 2016, at 9:27 PM, Britton Kerin (britton DOT kerin AT gmail DOT com) [via geda-user AT delorie DOT com] <geda-user AT delorie DOT com> = wrote:



On Sat, Jan 2, 2016 = at 6:07 PM, John Doty <jpd AT noqsi DOT com> wrote:

On Jan 2, 2016, at 7:47 PM, = Britton Kerin (britton DOT kerin AT gmail DOT com) [via geda-user AT delorie DOT com] <geda-user AT delorie DOT com> = wrote:



On Sat, Jan 2, 2016 = at 4:38 PM, John Doty <jpd AT noqsi DOT com> wrote:

On Jan 2, = 2016, at 6:07 PM, Britton Kerin (britton DOT kerin AT gmail DOT com) [via geda-user AT delorie DOT com] <geda-user AT delorie DOT com> = wrote:

Personally I find formats like this:

  = device=3DRESISTOR
  T 44400 49300 5 10 1 1 90 0 = 1
=

S= o, the subject the .sch file format?


substantially less readable than ones with field names, = but they are indeed easy to = parse.

Personally, I rarely edit = these things manually except for the text fields, which are not = difficult to find. The fact that they=92re easy to parse is handy for = automation.

  The pcb format is quite a bit more elaborate and the = savings from not rolling your own parser are more = significant.

I think you're criteria for what should go in libgeda are = spot-on btw.  Nor do I have any problem with a C interface calling = python or gschem or for that matter C++.  I do think providing a = clean C interface to libgeda gets by far the best return on investment, = since it's so widely known and with a little care wrappers can then be = provided almost automatically for a wide variety of languages (via SWIG = or some other similar mechanism -- or maybe Xorn facilitates this, I'm a = little unclear).

I don=92t find = deconstructing C data structures particularly easier than parsing the = format above. Just another layer I have to penetrate to get to the data. = I do significant processing with simple things like sed, which don=92t = handle binary data.

Wrappers CAN be provided, = but will they? FFI programming is not the easiest thing. I hear =  complaints about the need for developers to maintain code. It = seems to me that one way to address these concerns is to avoid and = eliminate unnecessary = code.

Good question.  = It's a great result if you get it but a lot more work than using a = serialization library, which is why the latter approach seems to me like = a useful step in the right = direction.
Serialization library? = Why do you want a extra, unnecessary, opaque interface? What, exactly, = are you trying to = accomplish?

Two = things: 

    1.  A = human- and partial-parser-script-readable = format

We = have that, I think. But you left out the most important virtue: = *simple*.

I agree = that it's readable enough, though it could be better.  I also agree = that simplicity is good.
 
    2.  = Full parsers for as many languages as possible without writing them by = hand

So instead, you = need to write an interface between a complicated parser and every = application by hand. Where=92s the = gain?

Here's what YAML = looks like from perl:

     use = YAML::XS;

     my $yaml =3D = Dump [ 1..4 ];
     my $array =3D Load = $yaml;

But you left = out the next step: you have to deconstruct whatever it built to do = anything with it. To do that, you have to understand the construction. = While if I simply read the file (the format is too trivial for a reader = to deserve the name =93parser"), I go directly to *my* application=92s = model of the underlying data, on a trajectory that matches what I need. = Your example seems to build some sort of complex data structure. What if = a line by line data-driven approach is more = natural?


The gain is that = this is a vastly easier way to vivify a saved object that to write my = own parser,

I disagree. If the = format were more complicated, you=92d have a point, but it=92s = not.

or even = my own partial parser for non-trivial cases.
Now take a look at the design goals for = YAML:

    http://www.yaml.org/spec/1.2/spec.html#id2708649

It's a good fit.  If it was only a matter of = the technical merits I would say as close to perfect as it gets with = software.

Compar= e it to http://wiki.geda-project.org/geda:file_format_spec
YAML is enormously more complex to no advantage for = us.

The point is = that you don't have to deal with any of that = complexity

It becomes = a dependency for *every* tool. It will break (they always do). It won=92t = work with *every* language on every OS.

(of which there really isn't all = that much -- calling it enormously complex is a big = overstatement).  It's a library with approximately two entry points = per language for modern languages, and not much more for = C. 

AWK?

= Parsing may be a non-issue for you if you only care about strings in = .sch files, but for many useful operations on pcbs you need the whole = thing, or most of = it.

Pcb may need it, = but that=92s a completely different issue. We=92re talking about .sch = files.

Can we *please* separate the projects so = that we don=92t keep going through this kind of = thing?

Unfortunately there's = the usual good-versus-most-popular trade-off in deciding between YAML = and JSON.  I still favor YAML in this case, largely because I can't = look at people like you and honestly claim that JSON is in all respects = fun to read/edit/sed over etc., and because my personal experience with = JSON is that although the parsers are truly ubiquitous they have some = annoying characteristics  (at least the Perl one = does).
But since it doesn=92t = relieve the need of the application programmer to understand the = interface, it is merely adding more code for no gain (or = even

I'm not sure what = you mean by this.  The programmer needs to understand what the = fields mean, sure.  YAML/JSON helps somewhat with this, because the = fields have names.  Even if you do understand the existing format, = that understanding that will absolutely not get you a live editable = version of what's in a pcb file without a lot of (pointless) additional = work.

We=92re not = talking about pcb. What you showed above is = .sch.


= negative gain, given the added complexity). And neither YAML nor JSON is = as universally readable and processable as the format we = have.

There's no added = complexity to speak of for clients, and YAML is far more readable and at = least as processable as what we have now.  I think your view of = things is strongly tied to your particular use = case. 

Cases. = Many cases.

It = sounds like you mostly work on attributes with their own special meaning = (IIRC noqsi has attributes with their own = syntax),

Perhaps you = mean gnet-spice-noqsi? All of the simulation flows have special = attributes beyond what pcb uses. Some PCB layout flows do too (do you = know why some library symbols have pins=3D and = class=3D?).

and = don't have to parse everything.  That's fine.  I sure don't = want to break anything for = you.

It sounds like = you do. It sounds like you=92re volunteering to rewrite everybody=92s = custom scripts for things like symbol generation and refdes = renumbering.


However, if you consider the actual = problem I'm hoping to address you might sympathize at least with the = thought that not reinventing the parser everywhere might be = worthwhile. 

=93Re= invent=94 and =93parser=94 imply a difficulty that we don=92t have. Is = it really that hard to compose a few strings = like:

"T %d %d %d %d %d %d %d %d = %d=94

Or just pick out a field you need with = something like (data-driven AWK):

/^T = /{
= numlines =3D $10
/* do something with the text = lines */
}

I started out to write a quick parser in = perl,

Why did you do = that? Why didn=92t you just start out to write tool in Perl, with = reading happening as the tool needs it? If you approach the problem = assuming it=92s hard, you make it hard.

in exactly the way you seem to be = proposing should be the way to do = everything.

You = didn=92t understand.

  It's a significant hassle and you end up with a slow = parser that only works from one language.  As you've pointed out = yourself, parsing (and serialization) is a relatively trivial, = thoroughly solved problem.  Why reinvent the = solution?

1. The = project has a significant investment in the other approach (tragesym, = refdes_renum). 

2. Users have an enormous = investment: there are lots of custom scripts that various people have = mooted here over the years, and I=92d guess there are many more private = ones.

3. You=92re using heavy words: =93parser=94= , =93reinvent=94, etc. for a lightweight = job.

4. The job of decoding the output of a = universal parser isn=92t much, if at all, simpler than just reading the = file if the file encoding is simple.

5. = Serialization is appropriate when you start from well-defined common = binary data structures. We don=92t have that. We have a well-defined = common text format.

I've taken some time over this = because at least one other person indicated that they shared your = concern about using a generic parser rather than an arbitrary custom = format.  So I'd like to actually convince you, lest you convince = others that doing as I propose is a bad idea for = pcb.

It might not be = a bad idea for pcb.

  I'd also like to apologize for bad attitude and = rudeness I've shown you in the past, and hope you're able to view this = issue in technical terms alone (I confess that I sometimes have = difficulty doing this with your = emails).

No apology = needed. I=92m a scientist, I=92m used to this. I recall the time I was = publicly accused of =93witchcraft=94 for digging out a result using an = unfamiliar statistical method. The accuser then went back to her lab, = took another look at her own data, and proved I was right. We never saw = a dispute as personal. Scientific research is like this: we have titanic = arguments with our friends. That=92s how we bring all of the facts and = ideas to light and get the science right.


Britton


John = Doty        =       Noqsi = Aerospace, Ltd.

http://www.noqsi.com/

jpd AT noqsi DOT com



= --Apple-Mail=_4FFE1E41-CC5C-4240-B661-6AB5A5FFE42F-- --Apple-Mail=_485C7A71-49FC-4AEF-8D2E-E7A4B212F0DE Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQIcBAEBCgAGBQJWiTTVAAoJEF1Aj/0UKykRlhoQAIK6B1yx58C6R8QHWhCAUYtJ SNWKe92XCoHvDMtHCN+pagynkeJay2NsnbjBXwcGiF64QcqCJv3oTssW+lEhPw3m KvKDU4KGkZ38ENSJKHgdLfiEiiOznEgM27PVq6PVXQrH0mpxqtuiRCCLxJIZlKDR pW+5qxSWgAzKA2bHTOc5Sh6TS4ClQyIxLhFxYaDpZxpC2ZW+tMCH9glmdLG1ImMD 7CIbB3RJaaOAGRsRNTHQYa5mKBJfFMWdnzSZ9L7bR/zyFfKgIKy798o3Ft4DM/1F u/lYYu5ezZDyeWY8cHmcYRRtUoS4AyjDxXTkPgAIlKbXX4HqOESGRui2Ai9ELQXO z6kSRbeC0nwXEumSISM6l4olOSZW4MkH/8rMCoRUqk+XUFTg2NSn5H4HMuA2r1eI Y7zkqKQsre3Qv1d6IQhO72WOW3q6cDxMu02aWK7YZvuR0I1NLLqcpwJ9zWL7gYdh UBpjEIgaYYm4uaWk6+eNlNQDx5wgZOEGa++qhfXx1A9wOdvmlKuhMuWEvQo5XaIb 1OHOWOfGeknzyyT6bT/lzJ17bLQZTUK4W33PTImHsoLlO9xChWwBX4oXjKLhzpGC ur1KmigyRGv1JPoBj+JwjVjF8MZ3SLvTIwoj56D59vxHaEfycayE6zLG+R7K66YV W2QrgmTzrRmAhtC3V8Qr =TvVk -----END PGP SIGNATURE----- --Apple-Mail=_485C7A71-49FC-4AEF-8D2E-E7A4B212F0DE--