Rust substring

Emir Buğra KÖKSALAN tarafından tarihinde yayınlandı

When we work on a project we need facilitator functions for strings. In rust we can make lots of things on strings (for example getting length, concataneting strings, split, trim etc) but unfortunatelly substring feature isn’t exist. When we make investigation then almost everybody suggest to use range operator over string slices. Let me show some example about that:

fn main() {
    let my_str: String = "abcde".to_string();
    let substr = &my_str[1..3];
    println!("substr: {}", substr);
}

This prints bc. It works right? NOOOOO. This works only english characters. But actually mostly we want to get UTF-8 pieces. Let us change string content with some UTF-8 characters and let us check what’s going on.

fn main() {
    let my_str: String = "ğüışöç".to_string();
    let substr = &my_str[1..3];
    println!("substr: {}", substr);
}

When you try to execute this code it will panic. Let us check the panic message:

thread 'main' panicked at src/main.rs:3:20:
byte index 1 is not a char boundary; it is inside 'ğ' (bytes 0..2) of `ğüışöç`

As you can see range operator parses string slice as bytes. But each UTF-8 character spends 2 bytes. Because of that it can’t reach to first byte. So what we gonna do now? How can we get sub pieces of UTF-8 encoded string slices?

The Chars struct

If you want to work with UTF-8 characters without breaking something then your main target will be the Char struct. And you can convert string slices to this struct easily. And also this struct implements the Iterator trait and with it’s functions you can work with UTF-8 characters easily. Let me show you to the some examples:

fn main() {
    let my_str: String = "ğüışöç".to_string();
    let chars = &my_str.chars();
    println!("chars: {:?}", chars);

    let sub_chars = chars.clone().skip(1).take(2);
    println!("sub_chars: {:?}", sub_chars);
}

Don’t forget to invoke clone() to chars variable. Because skip() and take() functions consumes self ownership. So now we can call collect() function from sub_chars.

fn main() {
    let my_str: String = "ğüışöç".to_string();
    let chars = &my_str.chars();
    println!("chars: {:?}", chars);

    let sub_chars = chars.clone().skip(1).take(2);
    println!("sub_chars: {:?}", sub_chars);

    let substring: String = sub_chars.collect();
    println!("substring: {}", substring);
}

Let us look to the output:

chars: Chars(['ğ', 'ü', 'ı', 'ş', 'ö', 'ç'])
sub_chars: Take { iter: Skip { iter: Chars(['ğ', 'ü', 'ı', 'ş', 'ö', 'ç']), n: 1 }, n: 2 }
substring: üı

As you can see we have the üı characters without any problem. Actually we have a small problem here. We must not use clone() in here but we have to do it like that in current example. So how can we escape from this problem? Let us change code from &my_str.chars() with range operator.

fn main() {
    let my_str: String = "ğüışöç".to_string();
    let my_str_ref = &my_str[0..];
    let chars = my_str_ref.chars();
    println!("chars: {:?}", chars);

    let sub_chars = chars.skip(1).take(2);
    println!("sub_chars: {:?}", sub_chars);

    let substring: String = sub_chars.collect();
    println!("substring: {}", substring);
}

As you can see we don’t need to clone anything anymore. Because we’re working directly string slices, not reference of String struct. Now we have all required things for creating substring() function. Let us do that:

fn substring(source: &str, from: usize, to: usize) -> String {
    if to <= from {
        return String::new();
    }
    source.chars().skip(from).take(to - from).collect()
}

fn main() {
    let my_str: String = "ğüışöç".to_string();
    println!("substring result: {}", substring(&my_str, 1, 3));
}

As you think the result will be üı in here.

Of course we can add more feature to this function (for example PHP’s substr() function behaviours), we can change parameter purposes (from-to, start-end, start-lenght etc) but current behaviours are enough I think. And also I guess the you got the main point (using Char struct on string slices).

Happy coding…

Kategoriler: Rust

Emir Buğra KÖKSALAN

Java & PHP Developer

0 yorum

Bir yanıt yazın

Avatar placeholder

E-posta adresiniz yayınlanmayacak. Gerekli alanlar * ile işaretlenmişlerdir

Time limit is exhausted. Please reload the CAPTCHA.

Bu site, istenmeyenleri azaltmak için Akismet kullanıyor. Yorum verilerinizin nasıl işlendiği hakkında daha fazla bilgi edinin.