This blog post is not about how to parse spreadsheets using ruby. If you are looking for that, you are not gonna find that here. This blog post is about a problem I faced while parsing decimal numbers from a spreadsheet, long story short precision related problems. Here are the details (the longer version):
Recently I had to parse an spreadsheet. The cells in the spreadsheet could have either of strings or numbers (float or otherwise). And since this application had to do with lot of calculations, even a specification being off by a single digit could lead to wrong set of results. So my point being that it was vitally important that I parse the data as it has been entered. Seems like a regular parsing situation… I had the same thought, but it turns out excel does a lot of things in the background.
FYI: I am using ‘roo’ gem for parsing the file. And these files are written in MS Excel.
Try doing this in irb:
irb(main):001:0> .09 + 0.0016 => .0916000000000001 irb(main):002:0> .09 + 0.0016 == .0916 => false
First of all this is the correct behaviour. And second of all the reason for this behaviour is because ruby uses IEEE754 doubles. And so does Excel. So if the cell has data whose value is .0916 and its format is set not set as ‘TEXT’, but something else for example: Number, General, etc, then what excel returns is .09160000000001. So even if user has entered a value like this, it could come up with a different precision altogether.
So, the question is how did I solve this problem ? Well, I did not. Not because I could not solve it, but because I was not fully satisfied with the solutions present right now and I am still trying to figure it out. But anyways I thought those solution could solve somebody else’s problem. So here are some details on those solutions:
. You could ask all your users to use the format of the data they are entering as ‘TEXT’. So excel or any other software that I am not aware of right now, will not change the values, and you would get what user has entered as it is.
. roo gem which I am using to parse the spread sheet, it has methods to find out the format of the data it is parsing. So you can just check in your code, if the format is ‘:float’, then just round the number. Here is how you could do it:
s = Roo::Excel.new('myspreadsheet.com') s.cell_type(4,2)
But again, as I said earlier, these solutions are not right or wrong, but these are trade-offs that you have to think of in terms of whats best for your application.
If you want to follow up the issue, here are some links to look at: roo gem github, same issue in a different gem and a related issue.