How to delete text in text file using Python based on conditions?

I have a text file from which I want to delete all data up to the point where I see the value ‘NODATACODE’ .
The text in the text file is:

MMMMM ; MMMMM : MMMMMMMMMMN, AAAAAAAAAAA,52, AAAA,CCCCCC, MMMMM ; MMMMM : MMMMMMMMMMN, 
  >AAAAAAAAAAA,200, AAAA,CCCCCC,;MMMMM ; MMMMM : MMMMMMMMMMN, AAAAAAAAAAA,53, 
  >AAAA,CCCCCC,AAAA AAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA NODATACODE, : Food Meal

Please let me know how I can rewrite the following code in Python to perform this task.
I tried the following code but it doesn’t work:

with open('Schedule.txt', 'w') as fw:
   for line in lines:
   if line.strip('\n') = 'NODATACODE':
                      fw.write(line)

Error message that I get is below:

     Cell In[1], line 5
     if line.strip('\n') = 'NODATACODE':
        ^
     SyntaxError: cannot assign to function call here. Maybe you meant '==' instead of 
       '='?

Original Output

enter image description here

Desired Output

enter image description here

Thank you in advance.

  • 1

    Did you read the error message?

    – 

  • if line.strip(‘\n’) == ‘NODATACODE’:

    – 

  • 2

    Line 5 should be !=, but this is a wild guess since your question is not clear enough. In that file are those lines separated by line breaks? Are there more lines after “NODATACODE”? The indentation is wrong. And I think you might need a read handle to get all the lines first, close it and write handle to write the lines you want.

    – 




  • @AnalysisNerd, can you make a meaningful example and show the exact matching expected output ?

    – 




  • @Timeless. Just making the required edits to my question. Thank you for your patience.

    – 

Here’s a revised version of your script:

# Read the file content first
with open('Schedule.txt', 'r') as file:
    data = file.read()

# Find the index where 'NODATACODE' occurs
nodata_index = data.find('NODATACODE')

# Check if 'NODATACODE' is found
if nodata_index != -1:
    # Extract the text from 'NODATACODE' to the end
    data_to_write = data[nodata_index:]

    # Write the modified data back to the file
    with open('Schedule.txt', 'w') as file:
        file.write(data_to_write)
else:
    print("'NODATACODE' not found in the file.")

In this script:

  1. The file is first opened in read mode to get all its content.
  2. We find the index of the first occurrence of ‘NODATACODE’ in the file’s data.
  3. If ‘NODATACODE’ is found, we extract all the text from this point onwards.
  4. Finally, we open the file in write mode and overwrite it with the extracted text.

This script assumes that ‘NODATACODE’ appears only once in your file. If ‘NODATACODE’ can appear multiple times and you want to delete content up to the last occurrence, you would need to adjust the logic to find the last index of ‘NODATACODE’.

Remember, opening a file in write mode (‘w’) as you did initially will immediately truncate the file, so it’s important to read its content before overwriting it. The syntax error you encountered is also fixed in this revised script.

This should do what you want; note that we test whether the line begins with ‘NODATACODE’, not is equal to it. And we use a flag so that the next lines will be written to the output file too:

with open('input_file.txt') as f_in:
    with open('output_file.txt', 'w') as f_out:
        write_flag = False
        for line in f_in.readlines():
            if line.startswith('NODATACODE'):
                write_flag = True
            if write_flag:
                f_out.write(line)

If ‘NODATACODE’ is likely to be inside a line, an approach with regex could be better:

import re

with open('input_file.txt') as f_in:
    with open('output_file.txt', 'w') as f_out:
        data = f_in.read()
        f_out.write(re.sub(r'[\w\W]*NODATACODE', 'NODATACODE', data))

Leave a Comment