I make a dataframe in my code and fill it in with data. Then in another function I call on that data and send it to a file. Only, when I call on it later and try to send its header to a file I get this error:
column_names = clsResults.columns.tolist()
^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'columns'
How can this be possible? I assigned freakin headers to it when I made the dataframe!
Here is me making the frame and filling it with content:
def growClassifier(NUMTREES: int, DEPTH: int, X: pd.DataFrame , y: np.ndarray):
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, shuffle=True)
print(f'\nBuilding classification forest with {NUMTREES} trees each {DEPTH} deep\n')
# Initialize the week classifier
base_estimator = DecisionTreeClassifier(max_depth=DEPTH)
start_time = time.time()
ada = AdaBoostClassifier(estimator=base_estimator, n_estimators=NUMTREES)
ada.fit(X_train, y_train)
elapsed_time = time.time() - start_time
# Use staged_predict to get staged predictions
staged_test_predictions = ada.staged_predict(X_test)
f1_test, accuracy_test, precision_test, recall_test, buildtime_test = [], [], [], [], []
# Iterate over staged predictions and evaluate performance at each stage
for i, y_pred in enumerate(staged_test_predictions, start=1):
# print(f'i:{i}\ny_pred:{y_pred}')
accuracy = accuracy_score(y_test, y_pred)
accuracy_test.append(accuracy)
precision = precision_score(y_test, y_pred, average="weighted", zero_division=0)
precision_test.append(precision)
recall = recall_score(y_test, y_pred, average="weighted", zero_division=0)
recall_test.append(recall)
f1 = f1_score(y_test, y_pred, average="weighted", zero_division=0)
f1_test.append(f1)
# print(f'accuracy:{accuracy}')
adaClsResults = pd.DataFrame()
numTrees, treeDepth = [], []
for x in range(1, NUMTREES+1, 1):
# print(i, x)
numTrees.append(x)
treeDepth.append(DEPTH)
buildtime_test.append(elapsed_time)
if (i>40):
while (i < NUMTREES):
accuracy_test.append(0)
precision_test.append(0)
recall_test.append(0)
f1_test.append(0)
buildtime_test[i] = 0
i += 1
adaClsResults['numTrees'] = numTrees
adaClsResults['treeDepth'] = treeDepth
adaClsResults['f1'] = f1_test
adaClsResults['accuracy'] = accuracy_test
adaClsResults['precision'] = precision_test
adaClsResults['recall'] = recall_test
adaClsResults['buildTime'] = buildtime_test
return adaClsResults
else:
print(f'\n\nfailed. Only boosted {i} times. Did not have {NUMTREES} stages. running again \n\n')
growClassifier(NUMTREES, DEPTH, X, y)
Ideas about how to improve the code would help also but my main concern is how come when I try to access my object later in the below function. I get the error I mentioned above.
def classificationRuns(model: str, task: str, allDatasets: list, clsDatasets: list, ESTNUM: int, startDEPTH: int, endDEPTH: int, MAX_RUNS: int, rawDataPath: str, aggDataPath: str ):
for dataset in allDatasets:
if dataset in clsDatasets:
# Get the data
X,y = parse.getClsData(dataset)
while startDEPTH < endDEPTH:
runNumber = 1
while (runNumber < MAX_RUNS + 1):
print(f'\nRun number:\t{runNumber}')
# run forest building
clsResults = growClassifier(ESTNUM, startDEPTH, X, y)
column_names = clsResults.columns.tolist()
# Join column names with tab separators
header="\t".join(column_names)
# Set file name system for raw data
saveRawDataHere = os.path.join(rawDataPath, dataset, f'_{ESTNUM}_{startDEPTH}_{dataset}_{model}_{task}_')
# add header to raw and agg file
with open(saveRawDataHere, 'a') as raw_file:
if isEmpty(saveRawDataHere):
raw_file.write(f"{header}\n")
# Set file name system for agg data
saveAggDataHere = os.path.join(aggDataPath, f'_{dataset}_{model}_{task}_')
# add header to agg data file
with open(saveAggDataHere, 'a') as agg_file:
if isEmpty(saveAggDataHere):
agg_file.write(f"{header}\n")
# write data to file
print(f'saving data in {saveRawDataHere}')
clsResults.to_csv(saveRawDataHere, mode="a", index=False, header=False, sep='\t')
# increment counter
runNumber += 1
startDEPTH +=1
The main concern is
clsResults = growClassifier(ESTNUM, startDEPTH, X, y)
column_names = clsResults.columns.tolist()
This is where the error occurs.
I’ve tried giving it a header when I first create it by
adaClsResults = pd.DataFrame(columns=X_train.colums)
but still end up getting the same error.
growClassifier
doesn’t return anything, so effectively you’re setting clsResults
to None
.
Then you try to access an attribute columns
of clsResults
, but since its value is None
, that is not possible. This is what the error message (AttributeError: 'NoneType' object has no attribute 'columns'
) is telling you.
To fix it, growClassifier
should return a dataframe.