Get a specific substring from a text string after comma or point characters in Notepad++

I need help with NOTEPAD++ regular expressions. This seems kind of simple … uff.

I have a xls file with 4 columns that contain text strings (1 column = 1 language). When I copy one line from this file to NOTEPAD++, I get a long string with all languages separated by a tab.

Example:

This is a example. It's my first time here. Hello everybody. ... Last sentence. TAB Ésto es un ejemplo. Es la primera vez que busco respuesta aquí. Hola a todos. ... Última frase. TAB Substring_German01. Substring_German02. Substring_German03. ... Substring_GermanXX. TAB Substring_French01. Substring_French02. Substring_French03. ... Substring_FrenchXX.

Replacing \t with \n in NOTEPAD++ I get:

This is a example. It's my first time here. Hello everybody. ... Last sentence. 
Ésto es un ejemplo. Es la primera vez que busco respuesta aquí. Hola a todos. ... Última frase. 
Substring_German01. Substring_German02. Substring_German03. ... Substring_GermanXX. 
Substring_French01. Substring_French02. Substring_French03. ... Substring_FrenchXX.

HOW CAN I GET A SPECIFIC SUBSTRING USING REGEX ???

Required results:

Search: REGEX SENTENCE
Replace: \1

This is a example.
Ésto es un ejemplo.
Substring_German01.
Substring_French01.

Search: REGEX SENTENCE
Replace: \2

It's my first time here. Hello everybody.
Es la primera vez que busco respuesta aquí.
Substring_German02.
Substring_French02.

Search: REGEX SENTENCE
Replace: \3

Hello everybody.
Hola a todos.
Substring_German03.
Substring_French03.

Thanks !

you can use Parenthesis (.*) (.*) ... in your Regex and access them with \1 \2 \n...

Example Hello Dude SomeFixedString How Are You

Search for (.*)SomeFixedString(.*)

Replace with \2 foooo \1

will give you

How Are You foooo Hello Dude

Use the following replace patterns:

(?:.*?\S[.!?]){0}\s*(.*?\S[.!?])(?:\s.*?TAB|(?!.*TAB)\s.*|\s*$)

(?:.*?\S[.!?]){1}\s*(.*?\S[.!?])(?:\s.*?TAB|(?!.*TAB)\s.*|\s*$)

(?:.*?\S[.!?]){2}\s*(.*?\S[.!?])(?:\s.*?TAB|(?!.*TAB)\s.*|\s*$)

(?:.*?\S[.!?]){3}\s*(.*?\S[.!?])(?:\s.*?TAB|(?!.*TAB)\s.*|\s*$)

…etc.

…and replace each match with \1\n

Try this:

([^ ][^. ]).*?\.

Working with what you have in the second textbox (After replacing \t with \n) using this RegEx search you should then get the substrings correctly.

It will also avoid the “…” part which i guess you don’t want.

Why don’t you save your xls file as XML Spreadsheet first? Then you already have each cell on its own line. Then just write code to remove the tags.

Leave a Comment