Invalid UTF-8 error on first argument of command line arguments

I was showing my friend some code I wrote in Rust to find how many times a word appears in a given file here’s the code

use std::fs;
use std::env;
use std::io;

const REQUIRED_ARGS: usize = 2;

fn word_in_file(word: &str, filename: &str) -> io::Result<Option<usize>> {
    let file_contents = fs::read_to_string(filename)?;

    let mut word_count: usize = 0;

    for line in file_contents.lines() {
        for text in line.split_whitespace() {
            if word == text { word_count += 1 }
        }
    } 

    match word_count {
        0 => return Ok(None),
        _ => return Ok(Some(word_count))
    }

}

fn main() {
    let cmd_args = env::args()
        //skip filepath arg because we don't need it
        .skip(1)
        .collect::<Vec<String>>();

    if cmd_args.len() < REQUIRED_ARGS { panic!("Not enough command args were passed") }

    let word = &cmd_args[0];
    let filename = &cmd_args[1];


    match word_in_file(word, filename).unwrap() {
        Some(count) => println!("Found the word {} in file {} {} times", word, filename, count),
        None => println!("Didn't find word {} in file {}", word, filename)
    }
}

My friend then told me I was being inefficient, because I didn’t need to collect the command line arguments as a vector I can just use skip and next to get the ones I want like this

let word: &String = &env::args().skip(1).next().unwrap();
let filename: &String = &env::args().skip(2).next().unwrap();

I accidentally forgot to skip the first argument and ran this code for the command arguments

let word: &String = &env::args().next().unwrap();
let filename: &String = &env::args().skip(1).next().unwrap();

And I got the error: called Result::unwrap() on an Err value: Error { kind: InvalidData, message: "stream did not contain valid UTF-8" } on line 27 where I call word_in_file(). For some reason the first command line argument when passed to fs::read_to_string() results in a invalid UTF-8 error. I tried my original code and passed the first argument of the command line arguments vector without skipping the first element and I didn’t get a invalid UTF-8 error, instead I got the expected called Result::unwrap() on an Err value: Os { code: 2, kind: NotFound, message: "The system cannot find the file specified." }

Does anyone know why this doesn’t throw a invalid UTF-8 error

let cmd_args = env::args()
        .collect::<Vec<String>>();
let first_arg: &String = &cmd_args[0];
fs::read_to_string(first_arg).unwrap(); //Expected error Err can't find file path

And why this does?

let first_arg: &String = &env::args().next().unwrap();
fs::read_to_string(first_arg).unwrap(); //Err invalid UTF-8

  • Have you actually tried both versions the same way, or did you try one via, say, cargo run and the other via calling the executable directly? Was the working directory the same? Have you output first_arg to the console to look at the contents?

    – 

  • They were ran the same, the directory was the same, when I print the contents of first arg both ways they’re the same

    – 

  • 1

    The first value or env::args() is the executable itself, that changes every time you compile the program. Maybe by chance, when you compile your program with collect() the binary file happens to be UTF-8 valid, but when you compile with next() a non-UTF-8 byte sequence is generated?

    – 




  • 1

    One run can’t open the file, the other can, but panics trying to read its contents as UTF-8 (because an executable is not UTF-8, obviously). It has nothing to do with the string in the argument. The mystery is why one version can open the file and the other can’t.

    – 

  • 1

    Wait a moment, this doesn’t make sense. The arguments to your program are “<word> <file>”, so if you shift it you get executable name for “word” and the first real argument for “file”. So it really depends on whether the word you passed happens to be a real filename in the working directory. Can you should the exact invocations for the programs?

    – 

Leave a Comment