My dataset contains two different languages in single column. Is their any solution to split the languages to two different columns in QGIS Below mentioned example for quick reference. I have this below added python script but it seperates only one language to other column.
import re
#Change these three lines to match your layer and field names
layer = QgsProject.instance().mapLayersByName("plingplong")[0]
field_to_read = "fielda"
field_to_update = "Required N"
fieldindex = layer.fields().indexFromName(field_to_update) #Find the index of the field to update
new_attributes = {}
pattern = r"(?i)\b[a-z]+\b"
for feature in layer.getFeatures():
words=" ".join(re.findall(pattern, feature[field_to_read]))
#print(words)
#hello
#this is a text
new_attributes[feature.id()]={fieldindex:words}
#new_attributes is now, each features id: {index of field to update: new text}
#{0: {1: 'hello'}, 1: {1: 'this is a text'}}
layer.dataProvider().changeAttributeValues(new_attributes)
eg: English Arabic needs to split in to English and Arabic (two separate columns)
enter image description here
Is their any solution to split the languages to two different columns
Yes, take the combined text and search the index of the first arabic character you can find, then use that index to split the text in 2, the first one would be english, the second arabic.
I’m assuming the text is always english + arabic
Never
english + arabic + english
arabic + english
or whatever other combination
If you want to use regular expressions check this answer
Simple Example:
import re
combined_text="Hello World مرحبا بالعالم"
pattern = re.compile('[\u0627-\u064a]')
index = re.search(pattern, combined_text).start()
english = combined_text[:index]
arab = combined_text[index:]
print(english)
# Hello World
print(arab)
#مرحبا بالعالم
start coding…
The only seperator between both coloums is a space ” ” ?