I was showing my friend some code I wrote in Rust to find how many times a word appears in a given file here’s the code
use std::fs;
use std::env;
use std::io;
const REQUIRED_ARGS: usize = 2;
fn word_in_file(word: &str, filename: &str) -> io::Result<Option<usize>> {
let file_contents = fs::read_to_string(filename)?;
let mut word_count: usize = 0;
for line in file_contents.lines() {
for text in line.split_whitespace() {
if word == text { word_count += 1 }
}
}
match word_count {
0 => return Ok(None),
_ => return Ok(Some(word_count))
}
}
fn main() {
let cmd_args = env::args()
//skip filepath arg because we don't need it
.skip(1)
.collect::<Vec<String>>();
if cmd_args.len() < REQUIRED_ARGS { panic!("Not enough command args were passed") }
let word = &cmd_args[0];
let filename = &cmd_args[1];
match word_in_file(word, filename).unwrap() {
Some(count) => println!("Found the word {} in file {} {} times", word, filename, count),
None => println!("Didn't find word {} in file {}", word, filename)
}
}
My friend then told me I was being inefficient, because I didn’t need to collect the command line arguments as a vector I can just use skip and next to get the ones I want like this
let word: &String = &env::args().skip(1).next().unwrap();
let filename: &String = &env::args().skip(2).next().unwrap();
I accidentally forgot to skip the first argument and ran this code for the command arguments
let word: &String = &env::args().next().unwrap();
let filename: &String = &env::args().skip(1).next().unwrap();
And I got the error: called Result::unwrap() on an Err value: Error { kind: InvalidData, message: "stream did not contain valid UTF-8" }
on line 27 where I call word_in_file()
. For some reason the first command line argument when passed to fs::read_to_string()
results in a invalid UTF-8 error. I tried my original code and passed the first argument of the command line arguments vector without skipping the first element and I didn’t get a invalid UTF-8 error, instead I got the expected called Result::unwrap() on an Err value: Os { code: 2, kind: NotFound, message: "The system cannot find the file specified." }
Does anyone know why this doesn’t throw a invalid UTF-8 error
let cmd_args = env::args()
.collect::<Vec<String>>();
let first_arg: &String = &cmd_args[0];
fs::read_to_string(first_arg).unwrap(); //Expected error Err can't find file path
And why this does?
let first_arg: &String = &env::args().next().unwrap();
fs::read_to_string(first_arg).unwrap(); //Err invalid UTF-8
Have you actually tried both versions the same way, or did you try one via, say,
cargo run
and the other via calling the executable directly? Was the working directory the same? Have you outputfirst_arg
to the console to look at the contents?They were ran the same, the directory was the same, when I print the contents of first arg both ways they’re the same
The first value or
env::args()
is the executable itself, that changes every time you compile the program. Maybe by chance, when you compile your program withcollect()
the binary file happens to be UTF-8 valid, but when you compile withnext()
a non-UTF-8 byte sequence is generated?One run can’t open the file, the other can, but panics trying to read its contents as UTF-8 (because an executable is not UTF-8, obviously). It has nothing to do with the string in the argument. The mystery is why one version can open the file and the other can’t.
Wait a moment, this doesn’t make sense. The arguments to your program are “<word> <file>”, so if you shift it you get executable name for “word” and the first real argument for “file”. So it really depends on whether the word you passed happens to be a real filename in the working directory. Can you should the exact invocations for the programs?
Show 4 more comments