Load data functions not working correctly on CSV(Comma Delimited) file

I’m using Visual Studio to write c code for an EEG sleep study lab. The data is loaded from a CSV (Comma Delimited) Excel file that has 50 rows and 3000 columns. Each row represents a time series signal with 3000 data points each. When I run the code it gives me number of rows = 500 and number of columns = 316.

int main(int argc, char* argv[]) {

    FILE* file = fopen("EEG_SleepData_30sec_100Hz.csv", "r");
    if (file == NULL) {
        perror("Error opening file");
        return EXIT_FAILURE;
    }

    //Loads data in from file name specified
    int num_signals = num_rows_in_file(file);
    int signal_length = num_cols_in_file(file);
    printf("number of rows = %d  number of columns = %d\n", num_signals, signal_length);

    double** dataset = load_data(file, num_signals, signal_length);
    // Print the entire dataset
    for (int i = 0; i < num_signals; i++) {
        for (int j = 0; j < signal_length; j++) {
            printf("%lf ", dataset[i][j]);
        }
        printf("\n");
    }
    
    return 0;
}


double** load_data(FILE* file, int numrows, int numcols) {

    if (file != NULL) {
        double** dataset = (double**)calloc(numrows, sizeof(double*)); // Allocate each of our row pointers.

        if (dataset == NULL) {
            return NULL;
        }

        for (int i = 0; i < numrows; i++) {
            dataset[i] = (double*)calloc(numcols, sizeof(double)); // Allocate our columns.
            if (dataset[i] == NULL) {
                return NULL;
            }
        }

        for (int i = 0; i < numrows; i++) {
            for (int j = 0; j < numcols; j++) {
                fscanf(file, "%lf,", &dataset[i][j]);
            }
        }
        return dataset;
    }
    else {
        fprintf(stderr, "Unable to find file! Ensure it is in the Debug directory.");
        return NULL;
    }

}


int num_cols_in_file(FILE* file) {
    int numcols = 0;
    if (file) {
        char buf[3000]; // Make a buffer we'll use to grab a whole row.

        if (fgets(buf, sizeof(buf), file) != NULL) {
            // Tokenize our buffer, looking for how many columns we have (aka how many tokens we can create)
            char* token;
            char* next_token = NULL;

            token = strtok(buf, ", \n\r\t", &next_token);  // Include commas and additional whitespace as delimiters

            while (token != NULL) {
                token = strtok(NULL, ", \n\r\t", &next_token);
                numcols++;
            }

            rewind(file); // Reset our position to the beginning of the file.

            return numcols;
        }
        else {
            fprintf(stderr, "Failed to read first row.\n");
            return 0;
        }
    }
    else {
        fprintf(stderr, "File is unopened. Numcols only works on opened files.");
        return 0;
    }
}


int num_rows_in_file(FILE* file) {
    int numrows = 0;
    if (file) {
        char buf[3000]; // Make a buffer we'll use to grab a whole row.

        while (fgets(buf, sizeof(buf), file) != NULL) {
            numrows++;

        }

        rewind(file); // Reset our position to the beginning of the file.
        return numrows;
    }
    else {
        fprintf(stderr, "File is unopened. Numrows only works on opened files.");
        return 0;
    }
}

When I used a debug statement to see what the function was reading it printed values that were also very incorrect. I have used the same load_data, num_rows_in_file, and num_cols_in_file functions all semester with no issue, what is wrong with the code?

  • char buf[100]; does not seem large enough to read a full line in a file with 3000 columns. If you open the file in a text editor how long is the longest line?

    – 




  • Edit the question to provide a minimal reproducible example, including complete source code in one sequence that other people can compile without editing or inserting anything and the exact text of at one line of the CSV file for which the program fails and the exact text you expect to be printed.

    – 

  • Looks like you are getting closer. But a buffer of 3000 bytes is still too small for 3000 columns. Since there are already 3000 commas. Depending how each data point looks you probably need 10 * 3000 or even 20 * 3000 bytes to store one line.

    – 

  • 2

    Did you open the file with a text editor to see how long the longest line was? How many commas would a 3000 column line have? How much space is left in char buf[3000]; for anything that isn’t a comma?

    – 

  • 1

    @RetiredNinja Thank you! Three hours of struggling just for the buffer to be too small.

    – 

Leave a Comment