I/O Operations
Data Type Mapping
The I/O functions support automatic data type conversion using the following mapping:
| Type | Polars DataType |
|---|---|
bool |
Boolean |
u8 |
UInt8 |
u16 |
UInt16 |
u32 |
UInt32 |
u64 |
UInt64 |
i8 |
Int8 |
i16 |
Int16 |
i32 |
Int32 |
i64 |
Int64 |
i128 |
Int128 |
f32 |
Float32 |
f64 |
Float64 |
date |
Date |
timestamp |
Datetime(Nanoseconds) |
datetime |
Datetime(Milliseconds) |
time |
Time |
duration |
Duration(Nanoseconds) |
sym |
Categorical |
cat |
Categorical |
str |
String |
File Reading Functions
rcsv
Reads a CSV file and returns a DataFrame.
| Parameters | Type | Description |
|---|---|---|
| path | str or sym | File path to the CSV file |
| has_header | bool | Whether the CSV has a header row |
| separator | str | Single character separator (e.g., "," for comma) |
| ignore_errors | bool | Whether to ignore parsing errors |
| dtypes | dict | schema dictionary mapping column names to data types |
Example:
rjson
Reads a JSON file and returns a DataFrame.
| Parameters | Type | Description |
|---|---|---|
| path | str or sym | File path to the JSON file |
| dtypes | dict | schema dictionary mapping column names to data types |
Example:
rparquet
Reads a Parquet file and returns a DataFrame.
| Parameters | Type | Description |
|---|---|---|
| path | str or sym | File path to the Parquet file |
| n_rows | i64 | Number of rows to read (0 for all rows) |
| rechunk | bool | Whether to rechunk the data |
| columns | syms | Column names to select (empty for all columns) |
Example:
rtxt
Reads a text file and returns its contents as a string.
| Parameters | Type | Description |
|---|---|---|
| path | str or sym | File path to the text file |
rbin
Reads a binary file and returns its contents as a list.
| Parameters | Type | Description |
|---|---|---|
| path | str or sym | File path to the binary file |
rdatabase(Pending)
Reads a database and returns a DataFrame.
| Parameters | Type | Description |
|---|---|---|
| database_url | str or sym | Database URL |
| sql | str | SQL query to execute |
rexcel(Pending)
Reads an Excel file and returns a DataFrame.
| Parameters | Type | Description |
|---|---|---|
| path | str or sym | File path to the Excel file |
| sheet_name | str | Sheet name to read |
File Writing Functions
wcsv
Writes a DataFrame to a CSV file.
| Parameters | Type | Description |
|---|---|---|
| path | str or sym | File path to the CSV file |
| df | dataframe | DataFrame to write |
| separator | str | Single character separator (e.g., "," for comma) |
| append | bool | Whether to append to existing file (true) or overwrite (false) |
Example:
wjson
Writes a DataFrame to a JSON file in JSON Lines format.
| Parameters | Type | Description |
|---|---|---|
| path | str or sym | File path to the JSON file |
| df | dataframe | DataFrame to write |
Example:
wparquet
Writes a DataFrame to a Parquet file.
| Parameters | Type | Description |
|---|---|---|
| path | str or sym | File path to the Parquet file |
| df | dataframe | DataFrame to write |
| compression_level | int | Compression level (1-22) |
Example:
wtxt
Writes text content to a file.
| Parameters | Type | Description |
|---|---|---|
| path | str or sym | File path to the text file |
| content | str | Text content to write |
| append | bool | Whether to append to existing file (true) or overwrite (false) |
Example:
wbin
Writes a string to a binary file.
| Parameters | Type | Description |
|---|---|---|
| path | str or sym | File path to the binary file |
| data | any | Data to write |
Database Functions
wdatabase(Pending)
Writes a DataFrame to a database.
| Parameters | Type | Description |
|---|---|---|
| database_url | str or sym | Database URL |
| table_name | sym | Table name |
| df | dataframe | DataFrame to write |
wexcel(Pending)
Writes a DataFrame to an Excel file.
| Parameters | Type | Description |
|---|---|---|
| path | str or sym | File path to the Excel file |
| sheet_name | str | Sheet name to write |
| df | dataframe | DataFrame to write |
Partitioned Data Functions
wpar
Writes a DataFrame as a partitioned dataframe, return the file size in bytes.
| Parameters | Type | Description |
|---|---|---|
| hdb_path | str or sym | Base path for the HDB |
| partition | date or i64 | Partition identifier (date or year 1000-3999) |
| table | sym | Table name |
| df | dataframe | DataFrame to write |
| sort_columns | syms | Columns to sort by, but there is no attribute concept for parquet files. |
| rechunk | bool | Whether to rechunk and consolidate partitions |
Tip
- Automatically creates table directories if they don't exist, but it won't create the hdb path.
- Supports date and year-based partitioning
- Automatically adds partition columns (
dateoryear) if not present - Handles sub-partitioning with automatic numbering
- Option to consolidate multiple sub-partitions into a single file
Example:
Utility Functions
hdel
Deletes a file.
| Parameters | Type | Description |
|---|---|---|
| path | str or sym | File path to the text file |
exists
Checks if a file or directory exists.
| Parameters | Type | Description |
|---|---|---|
| path | str or sym | File path to the text file |