DataFrame Problem: AttributeError: ‘NoneType’ object has no attribute ‘columns’

I make a dataframe in my code and fill it in with data. Then in another function I call on that data and send it to a file. Only, when I call on it later and try to send its header to a file I get this error:

    column_names = clsResults.columns.tolist()
                   ^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'columns'

How can this be possible? I assigned freakin headers to it when I made the dataframe!

Here is me making the frame and filling it with content:


def growClassifier(NUMTREES: int, DEPTH: int, X: pd.DataFrame , y: np.ndarray):
    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, shuffle=True)

    print(f'\nBuilding classification forest with {NUMTREES} trees each {DEPTH} deep\n')

    # Initialize the week classifier 
    base_estimator = DecisionTreeClassifier(max_depth=DEPTH)

    start_time = time.time()

    ada = AdaBoostClassifier(estimator=base_estimator, n_estimators=NUMTREES)
    ada.fit(X_train, y_train)

    elapsed_time = time.time() - start_time

    # Use staged_predict to get staged predictions
    staged_test_predictions = ada.staged_predict(X_test)
    

    f1_test, accuracy_test, precision_test, recall_test, buildtime_test = [], [], [], [], []
    # Iterate over staged predictions and evaluate performance at each stage
    for i, y_pred in enumerate(staged_test_predictions, start=1):
        # print(f'i:{i}\ny_pred:{y_pred}')
        accuracy = accuracy_score(y_test, y_pred)
        accuracy_test.append(accuracy)
        
        precision = precision_score(y_test, y_pred, average="weighted", zero_division=0)
        precision_test.append(precision)

        recall = recall_score(y_test, y_pred, average="weighted", zero_division=0)
        recall_test.append(recall)

        f1 = f1_score(y_test, y_pred, average="weighted", zero_division=0)
        f1_test.append(f1)

        # print(f'accuracy:{accuracy}')

    adaClsResults = pd.DataFrame()

    numTrees, treeDepth = [], [] 
    for x in range(1, NUMTREES+1, 1):
        # print(i, x)
        numTrees.append(x) 
        treeDepth.append(DEPTH)
        buildtime_test.append(elapsed_time)

    if (i>40): 
        while (i < NUMTREES):
            accuracy_test.append(0)
            precision_test.append(0)
            recall_test.append(0)
            f1_test.append(0)
            buildtime_test[i] = 0
            i += 1

        adaClsResults['numTrees'] = numTrees
        adaClsResults['treeDepth'] = treeDepth
        adaClsResults['f1'] = f1_test
        adaClsResults['accuracy'] = accuracy_test
        adaClsResults['precision'] = precision_test
        adaClsResults['recall'] = recall_test
        adaClsResults['buildTime'] = buildtime_test
        return adaClsResults

    else: 
        print(f'\n\nfailed. Only boosted {i} times. Did not have  {NUMTREES} stages. running again \n\n')
        growClassifier(NUMTREES, DEPTH, X, y)

Ideas about how to improve the code would help also but my main concern is how come when I try to access my object later in the below function. I get the error I mentioned above.


def classificationRuns(model: str, task: str, allDatasets: list, clsDatasets: list, ESTNUM: int, startDEPTH: int, endDEPTH: int, MAX_RUNS: int, rawDataPath: str, aggDataPath: str ):

    for dataset in allDatasets:
        if dataset in clsDatasets: 

            # Get the data 
            X,y = parse.getClsData(dataset)
            while startDEPTH < endDEPTH: 

                runNumber = 1
                while (runNumber < MAX_RUNS + 1):
                    print(f'\nRun number:\t{runNumber}')

                    # run forest building
                    clsResults = growClassifier(ESTNUM, startDEPTH, X, y)

                    column_names = clsResults.columns.tolist()

                    # Join column names with tab separators
                    header="\t".join(column_names)

                    # Set file name system for raw data
                    saveRawDataHere = os.path.join(rawDataPath, dataset, f'_{ESTNUM}_{startDEPTH}_{dataset}_{model}_{task}_')

                    # add header to raw and agg file
                    with open(saveRawDataHere, 'a') as raw_file:
                        if isEmpty(saveRawDataHere):
                            raw_file.write(f"{header}\n") 
                    
                    # Set file name system for agg data
                    saveAggDataHere = os.path.join(aggDataPath, f'_{dataset}_{model}_{task}_')
                    
                    # add header to agg data file 
                    with open(saveAggDataHere, 'a') as agg_file:
                        if isEmpty(saveAggDataHere):
                            agg_file.write(f"{header}\n")

                    
                    # write data to file
                    print(f'saving data in {saveRawDataHere}')
                    clsResults.to_csv(saveRawDataHere, mode="a", index=False, header=False, sep='\t')
                    # increment counter    
                    runNumber += 1
                startDEPTH +=1

The main concern is

                    clsResults = growClassifier(ESTNUM, startDEPTH, X, y)

                    column_names = clsResults.columns.tolist()

This is where the error occurs.

I’ve tried giving it a header when I first create it by

    adaClsResults = pd.DataFrame(columns=X_train.colums)

but still end up getting the same error.

growClassifier doesn’t return anything, so effectively you’re setting clsResults to None.

Then you try to access an attribute columns of clsResults, but since its value is None, that is not possible. This is what the error message (AttributeError: 'NoneType' object has no attribute 'columns') is telling you.

To fix it, growClassifier should return a dataframe.

Leave a Comment