Convert number to string (with internal identity and no more digits than necessary)

I need to transfer numbers generated in R to a python program, yet they need to be text (no binary transfer allowed). No information can be lost, I mean the exact IEEE binary image that is stored in R memory must be reconstructed in python (well, apart from NANs that can have many binary images, but any reconstruction to any NAN will do).

The internal identity property is described in this paper as (internal form being the binary IEEE representation; and external form the text output I need):

require[ment] that conversion from internal form to external form and back be an identity function

I’ve already tried with as.character(x) and sprintf("%.50f", x). In the former, information will be lost. The latter will output unnecessary characters (sequence of trailing zeroes or trailing nines that could be rounded — without loss of information — to 1 plus the previous digit).

Really nice if text representation is decimal, but if not possible, could also be octal, hex whatever (as long as it is still parseable somehow in python).

  • 1

    You want exact, yet you want to round? Seems contradictory. Do you want exact or do you want rounded? If exact, use sprintf with as many digits as you want.

    – 

  • 1

    The assumptions that we can know a priori how many digits must be used for floating-point numbers is hampered by IEEE-754, where even near-integer numbers might in fact have a whole lot of .9999999‘s (or .0000*0001). How are you certain that the “IEEE binary image” from R’s internal storage is acceptable to python?

    – 

  • @Onyambu “exact” and “round” in this case are not contradictory as long as both operations (output to text and input from text) are considered; the “print” operation can “understand” that rounding at a smart location will necessarily drive the “parse” operation to rebuild the exact value (exact to the limit of storage, since we are not talking about infinite storage). The paper I’ve linked in the OP explains that.

    – 

  • @r2evans in fact I don’t know; I assume they use standardized IEEE754 just because every machine (except maybe for Cray) does; but you have a valid point. For the trailing “99999” or “0000*1” I expect it to round (either up or down) at the correct location, to the shortest representation that will still yield the correct value upon parsing (see my previous comment)

    – 




  • 1

    @rslemos my apologies – I intended for “whoever asked this” to refer to the person who placed such a IMHO silly requirement on you.

    – 

Leave a Comment