Using glob to find all csv files in folder and subfolder but return list is empty

Question 1

So my ultimate goal is to add the data from multiple .csv files to a dataframe in Jupyter notebook.
I have been trying each piece first before I add them together but can’t get past just getting the filenames. There are other non-csv files in the folders that I want to ignore.

I have a file with the following structure. The bold ones are the ones I want:

directory: E:\Grad School\Research\Pearl_River\Data_Collection\Previous_work\CRMS_Data

| -Full_Accretion
| -Full_Accretion\Full_Accretion.csv
| -Full_Accretion\RESTORE_disclaimer.txt
| -Full_Discrete_Hydrographic
| -Full_Discrete_Hydrographic\Full_Accretion.csv
| -Full_Discrete_Hydrographic\RESTORE_disclaimer.txt
| -Full_Marsh_Vegetation
| -Full_Marsh_Vegetation\Full_Accretion.csv
| -Full_Marsh_Vegetation\RESTORE_disclaimer.txt
(plus more but that doesn’t really matter)

I have read through so many glob returning empty list questions and I’ve tried many iterations of code. I verified the files exist, that I spelled things correctly, that the path is correct. I’ve tried string literal or using the escape char. It only returns an empty list.

Here are the latest iterations

#Combine all the CRMS data into one dataframe
import os
from glob import glob
from pathlib import Path

dfs = []
fdir = r'E:\Grad School\Research\Pearl_River\Data_Collection\Previous_work\CRMS_Data'
ftype="*.csv"
all_files = [os.path.basename(i) for i in glob(r'E:\Grad 
School\Research\Pearl_River\Data_Collection\Previous_work\CRMS_Data\*.csv')]

#Get file names
#for path, subdir, files in os.walk(fdir):
#    for file in glob(os.path.join(fdir, ftype)):
#        all_files.append(file)
print(all_files)

#Get data
#for file in all_files:
#    data = pd.read_csv(file, index_col=None)
#    dfs.append(data)

#Add data to dataframe
#df = pd.concat(dfs)
#df.head(5)

The stuff that is commented out is other things I’ve tried.
os.getcwd() returns ‘C:\Users\w****\OneDrive – The University of Southern Mississippi\Research\Python’ but I’m not trying to access the working directory.

This also did not work. Same result, empty list.

os.chdir(r'E:\Grad School\Research\Pearl_River\Data_Collection\Previous_work\CRMS_Data')
all_files = [f for file in glob('*/.csv', recursive=True)]

or

os.chdir(r'E:\Grad School\Research\Pearl_River\Data_Collection\Previous_work\CRMS_Data')
all_files = [f for file in glob(r'*\.csv', recursive=True)]

I have tried a lot of different things and I’ve been staring at it too long. The loop that is commented out also returns an empty list even with all kinds of iterations of r’.csv’, r’*.csv’, r’/.csv’ in both the fdir and ftype.

So then lastly, I put it into Spyder (through Anaconda) so I could use the debugger and I noticed, for the first loop that is commented out, the following:
On the first pass of the outer loop, it sees the subfolders and puts those in subdir and files is blank.
Then it moves into the first subfolder, ‘Full_Accretion’, and also shows the files in files=[].
There is no file variable listed though and that is the one that is supposed to be appended to the list.
So I changed it to this:

for path, subdir, files in os.walk(fdir):
    for file in files:
        all_files.append(file)

It gave me filenames but it’s all of them and not just the csv. I added *.csv into the fdir name and it gave me an empty list again.

I have not used glob much in the past so it’s very likely user error. What am I missing? Thanks! (any missing not directly related imports such as pandas, are in the cells above this one)

Edit:
@bhlsing gave me the missing piece. It ended up either not adding the full path if I used the one-liner or would loop too many times and have duplicates for the loop. I figured it out and here is what finally worked:

import os
from glob import glob
import pandas as pd


all_files = []
fdir = r'E:\Grad 
School\Research\Pearl_River\Data_Collection\Previous_work\CRMS_Data'

fnames = [os.path.basename(i) 
         for i in glob(r'E:\Grad School\Research\Pearl_River\
         Data_Collection\Previous_work\CRMS_Data\*\*.csv')
         ]

#Get file names
for fname in fnames:
    filename = os.path.join(fdir, fname)
    all_files.append(filename)
print(all_files)

It’s probably not pythonic; I’m self-taught and still learning. Thanks!

Question 2

Looking at the edit to your question, I don’t understand why you take the filename (os.path.basename) and then concatenate it with the base dir/folder name: that will cut out any intermediate directories the CSVs were found in.

Consider this simple file tree:

- base_dir
 - a
    bar.csv
    foo.txt
 - b
    baz.csv

Running:

import os
from glob import glob


all_files = []
fdir = "base_dir"

fnames = [os.path.basename(i) for i in glob("base_dir/*/*.csv")]

# Get file names
for fname in fnames:
    filename = os.path.join(fdir, fname)
    all_files.append(filename)

print(all_files)

prints:

[
    'base_dir/bar.csv', 
    'base_dir/baz.csv',
]

Given your original question, I think you can get away with something as simple as:

all_csvs = glob("base_dir/*/*.csv")

or:

for fname in glob("base_dir/*/*.csv"):
    # do something w/fname
    print(fname)

Your sample files only show you care about CSVs named Full_Accretion.csv. If that isn’t just an artifact of a very small, sample set of names, glob can reflect that:

glob("base_dir/*/Full_Accretion.csv")

Leave a Comment Cancel reply