delorie.com/archives/browse.cgi | search |
--001a11369ad6460bde05289af39c Content-Type: text/plain; charset=UTF-8 On 5 Jan 2016 18:30, "DJ Delorie" <dj AT delorie DOT com> wrote: > > > > . a binary file might be smaller, but that does not matter much > > I wrote an app that used a tree-like data file for storage. It > supported both ascii and binary formats. Not only was the binary > format significantly smaller, but loaded 10x faster. Parsing text > files and adapting to the incoming data is more expensive than you > think. Indeed... text representations of floating point numbers take a lot of computation to turn into the correct binary machine value. This is one of the main reasons big 3D models in STEP format are slow to load. (There are lots of irrational numbers represented in text format, base 10). It is very easy to write a fast ASCII to double conversion, but only if you make some assumptions and sacrifice accuracy. Doing correct conversion - which yields the closest binary floating point number to the decimal floating point number described is hard to preform correctly, and time consuming. Hypothetically, I think the best compromise is a format which has a lossless translation between text and binary representations. In reality, the speed issue is for the most part irrelevant to us. We simply don't have the quantity of floating point numerical data in our files to cause enough slow down to warrant For processing 3D step files - two approaches... 1. Don't perform the conversion unless the number is needed (shunt strings in and out of the system). 2. Test out the idea of hashing and caching conversions.... I've a suspicion that many coordinates and vectors get repeated a lot.... (The Autodesk dwg format special cases 0.0 and 1.0 with a very short bit pattern (3 bits I recall), which gives them enough reduction in file size to make it worth while for them. (Btw... Anyone else react with a "wtf" to realise that the DWG binary format operates on a literal BIT stream? - ie. Not even byte alignment!) Peter --001a11369ad6460bde05289af39c Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <p dir=3D"ltr"><br> On 5 Jan 2016 18:30, "DJ Delorie" <<a href=3D"mailto:dj AT delori= e.com">dj AT delorie DOT com</a>> wrote:<br> ><br> ><br> > > . a binary file might be smaller, but that does not matter much<b= r> ><br> > I wrote an app that used a tree-like data file for storage.=C2=A0 It<b= r> > supported both ascii and binary formats.=C2=A0 Not only was the binary= <br> > format significantly smaller, but loaded 10x faster.=C2=A0 Parsing tex= t<br> > files and adapting to the incoming data is more expensive than you<br> > think.</p> <p dir=3D"ltr">Indeed... text representations of floating point numbers tak= e a lot of computation to turn into the correct binary machine value.=C2=A0= This is one of the main reasons big 3D models in STEP format are slow to l= oad. (There are lots of irrational numbers represented in text format, base= 10).</p> <p dir=3D"ltr">It is very easy to write a fast ASCII to double conversion, = but only if you make some assumptions and sacrifice accuracy. Doing correct= conversion - which yields the closest binary floating point number to the = decimal floating point number described is hard to preform correctly, and t= ime consuming.</p> <p dir=3D"ltr">Hypothetically, I think the best compromise is a format whic= h has a lossless translation between text and binary representations.</p> <p dir=3D"ltr">In reality, the speed issue is for the most part irrelevant = to us. We simply don't have the quantity of floating point numerical da= ta in our files to cause enough slow down to warrant<br></p> <p dir=3D"ltr">For processing 3D step files - two approaches... 1. Don'= t perform the conversion unless the number is needed (shunt strings in and = out of the system). 2. Test out the idea of hashing and caching conversions= .... I've a suspicion that many coordinates and vectors get repeated a = lot.... (The Autodesk dwg format special cases 0.0 and 1.0 with a very shor= t bit pattern (3 bits I recall), which gives them enough reduction in file = size to make it worth while for them.</p> <p dir=3D"ltr">(Btw... Anyone else react with a "wtf" to realise = that the DWG binary format operates on a literal BIT stream? - ie. Not even= byte alignment!)</p> <p dir=3D"ltr">Peter</p> --001a11369ad6460bde05289af39c--
webmaster | delorie software privacy |
Copyright © 2019 by DJ Delorie | Updated Jul 2019 |