Basics III

Let's continue with some more of the basics of Rust programming.

Crates and modules

As seem in the previous section, you can start a new Rust crate by running cargo init on a new empty folder (or cargo new my-cool-crate, which will create a new folder, roughly equivalent to mkdir my-cool-crate && cd my-cool-crate && cargo init && cd ..).

The new crate will contain the files

Cargo.toml
src/main.rs

Cargo.toml is where your crate configurations are stored, and src/main.rs is where the main file for your crate binary.

You can, of course, create additional files, but they will be ignored by the compiler unless you add them as modules on your src/main.rs.

So let's do the following:

Create a file called src/utils.rs, then add mod utils to the top of your src/main.rs.
Now create a function called myfunc at src/utils.rs and try to call it from src/main.rs with ``utils::myfunc`.
It won't work yet, first you need to prefix the function definition with pub to make the function accessible to the module parent (main).
This will work already, but here's an optional extra: instead of calling utils::myfunc, you can add a use utils::myfunc at the top of your src/main.rs and then you can use myfunc in your code.

Here's the outline of our files:

Cargo.toml

[package]
name = "my-cool-crate"
version = "0.1.0"
edition = "2021"

[dependencies]

src/main.rs

mod utils;
use utils::myfunc;

fn main() {
    myfunc();
}

src/utils.rs

pub fn myfunc() {
    println("hi!");
}

Submodules

Additionally, we can create a folder src/utils and add more files there, they will be modules of utils, so submodules of main. In that case however, it's recommended to move src/utils.rs to src/mod/utils.rs (it's optional, it has the same effect). So here's another example:

src/main.rs

mod utils;
use utils::myfunc;

fn main() {
    myfunc();
}

src/utils/mod.rs

// pub is need to allow main.rs
// to access reader
pub mod reader;

pub fn myfunc() {
    println!("hi!");
}

src/utils/reader.rs

pub fn anotherfunc() {
    println!("hello!");
}

Library crates

This setup is still not ideal for two reasons:

First, this is just a binary crate: will generate an executable just fine, but won't allow us to import function into other crates (for other internal projects or distributing the crate to other users).
Second, this will create problems with cargo test.

To solve that, we can rename src/main.rs to src/lib.rs and make some tweaks:

src/lib.rs

mod utils;
pub use utils::myfunc;
pub use utils::reader::anotherfunc;

pub fn run() {
    myfunc();
    anotherfunc();
}

Now external crates will have access to my_cool_crate::run, my_cool_crate::myfunc and my_cool_crate::anotherfunc, because we made them all public on src/lib.rs.

For that all you need to do is add my-cool-crate = { path="path_to/my-cool-crate" } at your other crate (assuming it's in the same machine). Moreover, publishing it to some Git repository or the official Rust crates repository is also a possibility.

However, we don't have binary crate anymore (cargo build will still work though). But to get that back, is quite simple because those functions are also accessible to src/main.rs out of the box, which can then be recreated with the following content:

src/main.rs

use my_cool_crate::run;

fn main() {
    run();
}

Multi-threading

A simple example of multithreading with rayon:

fn get_timestamp() -> Result<u128, Box<dyn std::error::Error>> {
    let timestamp = std::time::SystemTime::now()
        .duration_since(std::time::SystemTime::UNIX_EPOCH)
        .map_err(|e| format!("Error: {:?}", e))?
        .as_micros();

    Ok(timestamp)
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let start = get_timestamp()?;
    let mut var1 = 5;
    for _ in 0..100_000_000 {
        var1 += 1;
    }
    let mut var2 = 7;
    for _ in 0..100_000_000 {
        var2 += 1;
    }
    let end = get_timestamp()?;
    println!("Time taken (single-threaded): {}", end - start);

    let start = get_timestamp()?;
    let (pvar1, pvar2) = rayon::join(
        || {
            let mut pvar1 = 5;
            for _ in 0..100_000_000 {
                pvar1 += 1;
            }
            pvar1
        },
        || {
            let mut pvar2 = 7;
            for _ in 0..100_000_000 {
                pvar2 += 1;
            }
            pvar2
        },
    );
    let end = get_timestamp()?;
    println!("Time taken (multi-threaded):   {}", end - start);

    println!("var1 final value: {}", var1);
    println!("var2 final value: {}", var2);
    println!("pvar1 final value: {}", pvar1);
    println!("pvar2 final value: {}", pvar2);

    Ok(())
}

Which will output something like this:

Time taken (single-threaded): 1618570
Time taken (multi-threaded):   795160
var1 final value: 100000005
var2 final value: 100000007
pvar1 final value: 100000005
pvar2 final value: 100000007

Async

For better or for worse, a large portion of Rust ecosystem is async, so it's useful to give a brief introduction on the matter. Imagine the following scenario:

Your program is executing a very large amount of parallel tasks that are I/O bound, which means each of these tasks are idle (but blocking the thread), not consuming CPU processing power most of the time.

For example:

A process waiting for a slow HDD to read a fragmented file or a client communicating with some server over the network and waiting for a response.
A webserver waiting for client answers (and in this case, the number of tasks is not even fixed and determined locally as traffic to website varies).

On a traditional synchronous program, the number of parallel tasks is limited to the number of threads, so if you have a queue of tasks of 1000 tasks to be done, but only 10 allocated threads, even if each tasks uses on average 0.1% of CPU and just a little RAM, but takes around 5 seconds of wait to complete (e.g.: because of waiting the client over the network to process and previous request and send a response), it will take 500 seconds to complete all those tasks, by which time there might be thousands more in the queue, which might lead to slowness and increased response requests time for the clients of our webservice.

A solution would be to increase the number of threads, which is enough for many problems, but has its limitations: potential overhead caused by an excessive of threads being managed (in particular, context switches), difficulty to predict the necessary number of threads, etc and maybe ending up creating custom switch logic to the point that you might end up recreating functionality similar to asynchronous task management.

On the other hand, async creates a simple solution to this problem: every time a task is waiting for I/O, it yields to the async executor which can handle control of thread to other tasks that might have also being waiting for slow I/O in the past. This way, many tasks can share the same thread and thousands of tasks can effectively run concurrently with just a few threads.

Presently, the most common asynchronous runtime for Rust is tokio, being close to a de facto standard. Functions can be defined as asynchronous by prefixing them with the async keyword. Calling such functions per si, in general, won't cause anything to be executed, but rather, will return a Future (similar to a javascript Promise), which you can cause to be executed by calling await as seen in the following example:

src/main.rs

async fn some_operation(i: i32) {
    // this yields to the tokio runtime which will be able to
    // use the thread to make progress on other tasks and come
    // back here later to check if the sleep task is completed
    tokio::time::sleep(std::time::Duration::from_secs(3)).await;
    println!("some_operation({i}) completed");
}

#[tokio::main]
async fn main() {
    let task = some_operation(-2); // nothing happens yet
    // starts task and await, the task will take 3 seconds
    // only then we can proceed to the next line
    task.await;

    let task = some_operation(-1); // nothing happens yet
    let task = tokio::spawn(task); // starts a task in background
    task.await.unwrap_or_default(); // join task by awaiting

    // start 100 tasks in background
    let tasks = (0..100)
        .map(|i| tokio::spawn(some_operation(i)))
        .collect::<Vec<_>>();

    // join all tasks
    // this will block until all tasks are finished
    // but notice that the tasks are running concurrently
    // regardless of the number of cores in your computer
    // and this will take about 3 seconds to run
    // even with a large number of tasks
    // (e.g: try increasing to 1000 tasks)
    // by default, tokio will use as many threads
    // as there are cores in your computer
    // so we are running a hundred of tasks concurrently
    // using just few threads because whenever the function
    // some_operation() calls await (on the sleep), it yields
    // to the tokio runtime which can then reuse the
    // the free thread to make progress on another task
    for task in tasks {
        task.await.unwrap_or_default();
    }
}

Cargo.toml

[package]
name = "my-cool-crate"
version = "0.1.0"
edition = "2021"

[dependencies]
tokio = { version = "1.32", features = ["full"] }

Async HTTP requests with Reqwest

Let's now see an example of downloading and printing a JSON file with reqwest.

src/main.rs

// This is necessary on the main() function
// to start the tokio async runtime
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let url = "https://xkcd.com/info.0.json";
    let client = reqwest::Client::new();
    let response = client
        .get(url)
        .send()
        .await
        .map_err(|e| format!("Failed to send request to {url}: {e}"))?;

    if response.status().as_u16() != 200 {
        let response_status = response.status();
        Err(format!(
            "Failed to fetch {url}. Status code: {response_status}"
        ))?;
    }

    let body = response
        .text()
        .await
        .map_err(|e| format!("Failed to read response body: {e}"))?;
    println!("Response body: {}", body);

    Ok(())
}

Cargo.toml

[package]
name = "my-cool-crate"
version = "0.1.0"
edition = "2021"

[dependencies]
reqwest = "0.11"
tokio = { version = "1.32", features = ["full"] }

Which will output something like this:

Response body: {"month": "9", "num": 2829, "link": "", "year": "2023", "news": "", "safe_title": "Iceberg Efficiency", "transcript": "", "alt": "Our experimental aerogel iceberg with helium pockets manages true 100% efficiency, barely touching the water, and it can even lift off of the surface and fly to more efficiently pursue fleeing hubristic liners.", "img": "https://imgs.xkcd.com/comics/iceberg_efficiency.png", "title": "Iceberg Efficiency", "day": "15"}

Deserialization with Serde

Building upon the previous section, let's additionally deserialize the downloaded JSON string into a Rust struct using serde:

src/main.rs

use serde::Deserialize;

async fn get_xkcd_json(
    i: i32,
) -> Result<String, Box<dyn std::error::Error>> {
    let url = format!("https://xkcd.com/{i}/info.0.json");
    let url = url.as_str();
    let client = reqwest::Client::new();
    let response = client
        .get(url)
        .send()
        .await
        .map_err(|e| format!("Failed to send request to {url}: {e}"))?;
    if response.status().as_u16() != 200 {
        let response_status = response.status();
        Err(format!(
            "Failed to fetch {url}. Status code: {response_status}"
        ))?;
    }

    let body = response
        .text()
        .await
        .map_err(|e| format!("Failed to read response body: {e}"))?;

    Ok(body)
}

#[derive(Deserialize, Debug)]
pub struct Item {
    month: String,
    num: i64,
    year: String,
    title: String,
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut res = Vec::with_capacity(10);
    for i in 1..=3 {
        let body = get_xkcd_json(i).await?;
        let item: Item = serde_json::from_str(&body)?;
        res.push(item);
    }
    println!("{:?}", res);

    Ok(())
}

Cargo.toml

[package]
name = "my-cool-crate"
version = "0.1.0"
edition = "2021"

[dependencies]
reqwest = "0.11"
tokio = { version = "1.32", features = ["full"] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"

Which will output something like this:

[Item { month: "1", num: 1, year: "2006", title: "Barrel - Part 1" }, Item { month: "1", num: 2, year: "2006", title: "Petit Trees (sketch)" }, Item { month: "1", num: 3, year: "2006", title: "Island (sketch)" }]
[Finished running. Exit status: 0]

Additional resources

To go beyond the basics of the previous sections and have a deep dive into the Rust programming language, I recommend the following additional resources:

The Rust Programming Language.
Easy Rust.
Rust by Example.
A half-hour to learn Rust.
Rust course made by Google.
The Youtube channel Let's get Rusty.
Reddit r/learnrust for asking questions.
The web development section of this tutorial as an initial applied project.