remove duplicated persian word seperated with comma in a cell

osmanoca

Board Regular
Joined
Apr 16, 2016
Messages
87
hello, i have a macro to delete dublicated words and seperated with comma in same cell . but it works in english character languages. but it doesnt delete persian words. what is problem? please help.

my data is so


A B
nav اسم،اسم،نام،اسم،نام
pênûs قلم،مداد،مداد،قلم
çay چاي, چای‬, چای,جویبار, چای



i want them to beo so:
nav اسم،نام
pênûs مداد،قلم
çay جویبار, چای


please please help.
 

Excel Facts

What is the last column in Excel?
Excel columns run from A to Z, AA to AZ, AAA to XFD. The last column is XFD.
Re: remove dublicated persian word seperated with comma in a cell

Without seeing your code, I can't give you anything more than hints.

Persian is written with characters from the Arabic block of Unicode, primarily U+0600 through U+06FF. There are some other characters, too, but most texts will be covered by that range. See this English Wikipedia page: https://en.wikipedia.org/wiki/Arabic_script_in_Unicode.

If your code looks through the cell character by character, the character code for a Persian symbol does not fit into a Chr(), you have to use a wide character, a ChrW().

I wrote a script to remove Arabic characters from Kurdish in this thread: https://www.mrexcel.com/forum/excel...y-arabic-letters-post4928543.html#post4928543

I've rewritten the script to be a little more efficient. Perhaps it will point you in the right direction:
Code:
Sub TrimArabicChars()
''''
' Removes Arabic characters from the selected cells.
'   Will not remove a trailing space.
''''
    Dim ary() As Variant
    Dim glyph As String
    Dim i As Long
    Dim rx As Long, cx As Long
    Dim rng As Range
    Dim s_len As Long
    Dim str As String
    Dim swap As String

    If TypeName(Selection) <> "Range" Then Exit Sub
    Set rng = Selection

    ' Don't create a two-element array when the selection is a single cell.
    If rng.Cells.Count = 1 Then
        ReDim ary(1 To 1, 1 To 1)
        ary(1, 1) = rng.Value2
    Else
        ary = rng.Value2
    End If

    For rx = 1 To UBound(ary, 1)
        For cx = 1 To UBound(ary, 2)
            str = ary(rx, cx)
            s_len = Len(str)

            If s_len Then

                For i = 1 To s_len
                    glyph = Mid(str, i, 1)

                    ' Remove the Arabic characters.
                    If glyph < ChrW(&H600) Or glyph > ChrW(&H6FF) Then
                        swap = swap & glyph
                    End If

                Next i
                
                ary(rx, cx) = swap
                swap = ""
            End If
        Next
    Next

    rng.Value2 = ary
End Sub
 
Upvote 0
Re: remove dublicated persian word seperated with comma in a cell

this code is for deleting arabic charecters. worked. but here i need a macro for deleting dublicated words in a cell. my macro is below. working for latin characters but now working for arabic. please look and edit for arabic also:


Sub test2()
Dim c, arr, el, data, it
Dim start As Date
Dim targetRange As Range


Dim dict As Object
Set dict = CreateObject("Scripting.dictionary")


Application.ScreenUpdating = False


Set targetRange = Range("A1:G20000")


data = targetRange


start = Now
For i = LBound(data) To UBound(data)
For j = LBound(data, 2) To UBound(data, 2)
c = data(i, j)
dict.RemoveAll
arr = Split(c, ",")
For Each el In arr
On Error Resume Next
dict.Add Trim(el), Trim(el)
On Error GoTo 0
Next
c = ""
For Each it In dict.Items
c = c & it & ","
Next
If c <> "" Then c = Left(c, Len(c) - 1)
data(i, j) = c
Next j
Next i
targetRange = data
Application.ScreenUpdating = True


MsgBox "Working time: " & Format(Now - start, "hh:nn:ss")


End Sub
 
Upvote 0
Re: remove dublicated persian word seperated with comma in a cell

I'm working on a macro now. I tried regular expressions but they are crippled in VBA compared to other languages. I'm working on using commas as word separators.

The word split points you show include:
Latin commas, ","
Latin comma plus a space, ", "
and Persian (Arabic) commas, "،"
Do your original texts have any Persian commas plus a space acting as separators?

Your code line « arr = Split(c, ",") » won't work as you want – it only catches the solitary Latin comma. It doesn't catch the Arabic comma.

I don't read Persian. I've split the words you gave as a sample using a formula. Words are in between the diagonal slashes. Are these the correct points for the word separators?
nav اسم/اسم/نام/اسم/نام
pênûs قلم/مداد/مداد/قلم
çay چاي/چای‬/چای/جویبار/چای

Which comma would you prefer in the results?
 
Last edited:
Upvote 0
Re: remove dublicated persian word seperated with comma in a cell

Try this on a COPY of your spreadsheet.

Code:
Sub RemovePersianDuplicates()
    Dim ary As Variant
    Dim dict As Object
    Dim i As Long, j As Long, k As Long
    Dim latin_comma As Boolean
    Dim rng As Range
    Dim txt As String
    Dim word As Variant
    Dim word_ary As Variant
    Dim word_coll As New Collection

    With ActiveSheet
        Set rng = Intersect(Range(.Columns(1), .Columns(7)), .UsedRange)
    End With

    For i = LBound(ary, 1) To UBound(ary, 1)
        For j = LBound(ary, 2) To UBound(ary, 2)
            txt = CStr(ary(i, j))

            ' Parse txt only if it contains more than one word.
            If txt <> "" And InStr(Trim(txt), " ") > 0 Then
            
                ' Remove the POP DIRECTIONAL FORMATTING character, U+202C.
                '   We have enough problems with these bidirectional texts.
                txt = Replace(txt, ChrW(&H202C), "")
                
                ' Latin comma+space or Persian comma with no space?
                latin_comma = True
                If InStr(txt, ChrW(1548)) > 0 Then latin_comma = False
                
                ' Grab the first Latin word.
                txt = Replace(txt, " ", "|", Count:=1)
                word_ary = Split(txt, "|")
                word_coll.Add word_ary(0), word_ary(0)
                txt = word_ary(1)
                
                ' Split the words on the commas.
                txt = Replace(txt, ChrW(1548), "|")
                txt = Replace(txt, ",", "|")
                txt = Replace(txt, " ", "")
                word_ary = Split(txt, "|")
                
                ' Remove the duplicates.
                For Each word In word_ary
                    On Error Resume Next
                    word_coll.Add word, word
                    On Error GoTo 0
                Next word
                
                ' Reassemble the entire string.
                txt = word_coll(1)
                For k = 2 To word_coll.Count
                    If k = 2 Then
                        txt = txt & " " & word_coll(k)
                    ElseIf latin_comma Then
                        txt = txt & ", " & word_coll(k)
                    Else
                        txt = txt & ChrW(1548) & word_coll(k)
                    End If
                Next k
                
                ary(i, j) = txt
                Set word_coll = Nothing
            End If
        Next j
    Next i
    
    ' Send the texts to the worksheet.
    rng.Value2 = ary
End Sub

I'm having trouble with çay جویبار, چای
 
Last edited:
Upvote 0
Re: remove dublicated persian word seperated with comma in a cell

I think the VBA code, above, is the best I can do.

Looking into the problems I have deleting duplicate Persian words from the string: çay چاي, چای‬, چای,جویبار, چای
چاي is not the same as چای

One uses =UNICHAR(1610), the ARABIC LETTER YEH; the other uses =UNICHAR(1740), the ARABIC LETTER FARSI YEH. The character encoding makes the program see the two strings as not identical.

Each written human language has its own rules that make programming difficult. I uncovered two encoding-decoding difficulties when working with your three example strings. The second one I uncovered was the character =UNICHAR(8236) or U+202C. I dealt with that symbol by making the computer ignore it.
 
Last edited:
Upvote 0
Re: remove dublicated persian word seperated with comma in a cell

Oh, drat! I accidentally erased a line When I posted the code. A very necessary line.

At the top of the script, the line in red should be added:

Rich (BB code):
    With ActiveSheet
        Set rng = Intersect(Range(.Columns(1), .Columns(7)), .UsedRange)
    End With
    
    ary = rng.Value2

    For i = LBound(ary, 1) To UBound(ary, 1)
 
Upvote 0
Re: remove dublicated persian word seperated with comma in a cell

THANKS VERY MUCH DEAR thisoldman, THİS Worked well. as you said there was a problem for not deleting. you solved it. now it is working well.
really thanks. İ hope the god helps you also in your life.

see you later...

Oh, drat! I accidentally erased a line When I posted the code. A very necessary line.

At the top of the script, the line in red should be added:

Rich (BB code):
    With ActiveSheet
        Set rng = Intersect(Range(.Columns(1), .Columns(7)), .UsedRange)
    End With
    
    ary = rng.Value2

    For i = LBound(ary, 1) To UBound(ary, 1)
 
Upvote 0
Re: remove dublicated persian word seperated with comma in a cell

I'm glad to have helped. Thank you for your kind words.
 
Upvote 0
Re: remove dublicated persian word seperated with comma in a cell

Dear thisoldman, i saw i problem in macro. it is removing spaces also. the macro thinks dublicated spaces as word and removes them. and this makes problem for words have spaces. how can we solve this. it must now delete spaces.

example:
قبل از اینکه
علاوه بر آن که

<tbody>
</tbody>
these words are together. must stay spaces.
after runnig macro they made so:
قبل ازاینکه
علاوه برآنکه

<tbody>
</tbody>

<tbody>
</tbody>

please help.


I'm glad to have helped. Thank you for your kind words.


 
Upvote 0

Forum statistics

Threads
1,216,111
Messages
6,128,899
Members
449,477
Latest member
panjongshing

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top