Hello,
I'm trying to figure out why an Int is being inferred over my explicit Double
I'm parsing a CSV that contains 2 tables. I don't own the data so I'm not able to change it.
- The first row contains one cell that's used as a title for the document
- The second row is empty
- The third row contains one cell that's used as the header for the first table
- There is a header row for the table
- There's a dynamic number of rows for this table
- The an empty spacer row
- There is a row that's used as a title for the second table
- There is a header row for the table
- There's a dynamic number of rows for this table
Im able to separate and create two DataFrame's from the data without issue. And this is the initializer I'm using.
DataFrame(
csvData: csvData,
rows: rows,
types: types,
options: options
)
Column names and their CSV types looks like this
var types: [String: CSVType] {
[
// ...
"Column 38": .double,
// ...
]
}
The data in the CSV is
0
nil
nil
nil
2
And this is what the one of the columns in question looks like when printed
▿ 38 :
┏━━━━━━━━━━━┓
┃ Column 38 ┃
┃ <Int> ┃
┡━━━━━━━━━━━┩
│ 0 │
│ nil │
│ nil │
│ nil │
│ 2 │
└───────────┘
- name : "Column 38"
- count : 5
▿ contents : PackedOptionalsArray<Int>
▿ storage : <PackedOptionalsStorage<Int>: 0x600000206360>
The docs state
/// - types: A dictionary of column names and their CSV types.
/// The data frame infers the types for column names that aren't in the dictionary.
Since types contains the column name and it's still being inferred, my assumption is that the issue involves the renaming of the header row when it has empty cells occurs after the types are checked.
Edit:
After setting hasHeaderRow: false from true and adjusting my row offset, the types are now being assigned correctly.
I'd recommend opening a feedback enhancement where renaming columns occurs before type assignment.