Polars

Polars is a DataFrame library for Rust, with well documented Python bindings available. It recently exploded in popularity in the Python world, now having its own company with quite substantial funding. Under the hood, Polars uses Apache Arrow in memory format (akin to Python Pandas using Numpy), which makes it an extremely fast dataframe library.

A simple Polars example

Let's start by defining our dependencies to be added to Cargo.toml:

1[package]
2name = "polars-example"
3version = "0.1.0"
4edition = "2021"
5
6[dependencies]
7polars = { version = "0.38" }
8ureq = { version = "2.7" }

And some code for our src/main.rs:

1use polars::{frame::DataFrame, prelude::*};
2use std::io::Read;
3
4fn read_iris_csv() -> Result<DataFrame, Box<dyn std::error::Error>> {
5 // Let's use ureq to download the iris dataset
6 let response = ureq::get(
7 "https://raw.githubusercontent.com/ms-robot-please/Python-for-Data-Science/master/iris.csv",
8 )
9 .call()?;
10 let len = response
11 .header("Content-Length")
12 .and_then(|s| s.parse::<usize>().ok())
13 .unwrap_or(10_000_000);
14
15 let mut bytes: Vec<u8> = Vec::with_capacity(len);
16 response.into_reader().read_to_end(&mut bytes)?;
17
18 // Alternatively, you could have used reqwest instead of ureq
19 // with Cargo.toml dependency:
20 // reqwest = { version = "0.11.6", features = ["blocking"] }
21 // and code:
22 // let response = reqwest::blocking::get(
23 // "https://raw.githubusercontent.com/ms-robot-please/Python-for-Data-Science/master/iris.csv",
24 // )?
25 // .error_for_status()?;
26 // let bytes = response.bytes()?;
27
28 // Read the downloaded bytes to a DataFrame
29 let df = CsvReader::new(std::io::Cursor::new(bytes))
30 .has_header(true)
31 .finish()?;
32
33 // Alternatively, you could have saved the download
34 // to a file and then read the file with polars
35 // let mut dest = std::fs::File::create("iris.csv")?;
36 // std::io::copy(&mut response.into_reader(), &mut dest)?;
37 // let df = CsvReader::from_path("iris.csv")?
38 // .has_header(true)
39 // .finish()?;
40
41 Ok(df)
42}
43
44fn main() -> Result<(), Box<dyn std::error::Error>> {
45 let df = read_iris_csv()?;
46
47 // Prints the "whole" dataframe... well, actually output is truncated
48 println!("{df}");
49
50 // Prints only the beginning of the database
51 // Defaults to 10 rows
52 println!("{}", df.head(None));
53
54 // Prints only the first 2 rows
55 println!("{}", df.head(Some(2)));
56
57 // Prints the dataframe shape
58 println!("{:?}", df.shape());
59
60 Ok(())
61}

If you found this project helpful, please consider making a donation.