I have a yaml file like this:
-
ip: 1.1.1.1
status: Active
type: 'typeA'
-
ip: 1.1.1.1
status: Disabled
type: 'typeA'
-
ip: 2.2.2.2
status: Active
type: 'typeC'
-
ip: 3.3.3.3
status: Active
type: 'typeB'
-
ip: 3.3.3.3
status: Active
type: 'typeC'
-
ip: 2.2.2.2
status: Active
type: 'typeC'
-
I’m going to find any duplicate IPs which type
is the same.
For example, IP 1.1.1.1
has two entries and both types are typeA
, so it should be considered. But IP 3.3.3.3
‘s type is not the same so it should not be.
Expected output:
IP 1.1.1.1, typeA duplicate
IP 2.2.2.2, typeC duplicate
install pyyaml
using pip install pyyaml
then run the python script by replacing myyaml.yaml
with your YAML
file
import yaml
with open('myyaml.yaml', 'r') as file:
data = yaml.safe_load(file)
ip_type_map = {}
for entry in data:
if entry and 'ip' in entry and 'type' in entry:
ip, entry_type = entry['ip'], entry['type']
print(f"IP {ip}, {entry_type} duplicate") if (ip in ip_type_map and entry_type == ip_type_map[ip]) else ip_type_map.update({ip: entry_type})
else:
print("Invalid entry in YAML data.")
There are a few ways you can do this. All of them require you loading yaml in with a parser. Use a library like pyyaml https://pypi.org/project/PyYAML/
import yaml
with open('yaml_file.yml', 'r') as stream:
file = yaml.safe_load(stream)
First way, loop over each row making note of the types you’ve seen, storing them in a list, along with the indexes of duplicates. This will preserve the original list order.
rows_to_remove = []
rows_seen = []
for idx, row in enumerate(file):
if (row['ip'], row['type']) in rows_seen:
rows_to_remove.append(idx)
continue
rows_seen.append((row['ip'], row['type']))
for row_idx in rows_to_remove:
file.pop(row_idx)
Create a new list using a list comprehension.
[dict(t) for t in {tuple(d.items()) for d in file}]
This loops over each line in the list, turning the dictionary into a tuple and storing that in a set, which doesn’t allow for duplicates. Each unique line is then turned back into a dictionary stored inside of a list.
I’ll point out that these questions have already been answered and could be found with some searching.
How can I parse a YAML file in Python
Remove duplicate dict in list in Python
You don’t indicate why there is a comma after the IP address in the first line of the output, but not on the second line. Is that determined by the value for key
type
?@Anthon you mean in the
expected output
? That’s a typo, sir. Since that’s the output, it doesn’t much matter. But I really appreciate that. I edit the question and add the comma.