The problem here is that there’s really not enough context in a CSV file to determine intended data types. Unlike JSON, that specifies at least some basic hints to data types (strings must be in double quotes, numbers must not be quoted, …), the CSV format doesn’t really have any such notions.
Popular programs (like Excel) attempt some pre-scanning trickery in an attempt to guess proper data types for various columns.
IMO, about the best you can do here is to make some attempt to infer the intended data type based on your own set of rules.
If I were you, I’d probably write some sort of
detect_data_type() function that takes a CSV field as input and returns its data type (or, the converted value directly) based on rules you define for your data. It could be as simple as something like:
- if the field could be an
- if the field could be a
- otherwise, return
… or, as complex as necessary.
Armed with such a function, you could then just cyclically read your CSV, pass every field through the function, and convert the data as needed.
Thanks for the answer! I think I might not have been clear enough though: I do know what data type each column is, so that’s not the challenge. However, getting all values as strings and then looping over everything in order to convert it, is something I wanted to avoid, or at least I wanted to see if there is a more efficient approach. To rephrase: If I do know that the first column is a float, is there a way to read it as a float, or do I have to read it in as a string and then convert?
With specific knowledge of the data types of each field, you could use the various
get_* methods to read each individual value in the file (as an appropriate type). However, you’d then be responsible for parsing the file yourself (and dealing with the comma separators).
That’s likely a lot more work and more prone to errors than using
get_csv_line() and then converting the individual strings to the appropriate data type (even though it might be more efficient).
If you’re going to need this for varying sets of data, I’d still recommend you employ something like I mentioned above. The difference being that you no longer need to be aware of the data type of each field - only that it meets the rules of your
detect_data_type() method (or similar).
jgodfrey | 2022-11-07 21:01