Page 1 of 2 12 LastLast
Results 1 to 10 of 12

Thread: How to extract Arabic language words from list of different language

  1. #1
    New Member
    Join Date
    Jun 2019
    Posts
    4
    Post Thanks / Like
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    Default How to extract Arabic language words from list of different language

    Hi

    I have a sheet with a column of list of words in different languages I want to get only Arabic words and put them in different row
    For example:
    input
    1- Sharifah,Sharīfah,shryft,شريفة
    2- Umm al `Ulaymat,Umm al ‘Ulaymāt,`Awayid,am almymat,أم الميمات,عوايد,‘Awāyid
    3- Al Jamaliyah,جمالية,Al Jamālīyah,Jamaliyah,Jamālīyah,aljmalyt,jmalyt,الجمالية

    output:
    row 1: شريفة
    row 2:
    أم الميمات
    row 3:
    عوايد
    row 4: الجمالية
    row 5:
    جمالية

    Thank you in advance


  2. #2
    MrExcel MVP
    Join Date
    Apr 2006
    Posts
    19,690
    Post Thanks / Like
    Mentioned
    15 Post(s)
    Tagged
    2 Thread(s)

    Default Re: How to extract Arabic language words from list of different language

    Hi

    A simple solution is to write a vba snippet and use the unicode codes to get the Arabic characters.
    A quick question in the google tells me that the Arabic alphabet is characters hex 600 to 6FF and so you just loop through the characters and if they are in that range you extract them.
    If you know Arabic you'll know better than I if these characters are enough or if you need some others.


    The only difficulty I see is what you have in row 2. In all other rows you have extracted just one word but in row 2 you extracted 2 words. I'm sure that that will make sense to someone knowing Arabic but you'll have to include that kind of exceptions in the code.
    Or maybe it's just because the text you want to extract is always between commas?
    Kind regards
    PGC

    To understand recursion, you must understand recursion.

  3. #3
    MrExcel MVP
    Join Date
    Apr 2006
    Posts
    19,690
    Post Thanks / Like
    Mentioned
    15 Post(s)
    Tagged
    2 Thread(s)

    Default Re: How to extract Arabic language words from list of different language

    This is a quick test to extract Arabic text between 2 commas, like in your example.

    Place the text in your example A2:A4 and the code extracts the Arabic text to B2, down.

    Code:
    Sub GetArabic()
    Dim regexMatches As Object
    Dim r As Range
    Dim s As String
    Dim j As Long
    
    Set r = Range("A2:A4")
    s = "," & Join(Application.Transpose(r), ",") & ","
    
    With CreateObject("VBScript.RegExp")
        .Pattern = ",[ \u0600-\u06FF]+(?=,)"
        .Global = True
        Set regexMatches = .Execute(s)
    End With
    
    ' write the result
    For j = 0 To regexMatches.Count - 1
        r(1).Offset(j, 1) = Mid(regexMatches(j).Value, 2)
    Next j
    
    End Sub
    Kind regards
    PGC

    To understand recursion, you must understand recursion.

  4. #4
    Board Regular sandy666's Avatar
    Join Date
    Oct 2015
    Posts
    2,820
    Post Thanks / Like
    Mentioned
    32 Post(s)
    Tagged
    1 Thread(s)

    Cool Re: How to extract Arabic language words from list of different language

    another way:



    PowerQuery aka Get&Transform

    Code:
    // Table1
    let
        C2R = List.Transform({0..300}, each Character.FromNumber(_)),
        Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
        Split = Table.ExpandListColumn(Table.TransformColumns(Source, {{"raw", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "raw"),
        Replace = Table.ReplaceValue(Split,"‘","",Replacer.ReplaceText,{"raw"}),
        Arabic = Table.AddColumn(Replace, "Arabic", each Text.Trim([raw],C2R)),
        Filter = Table.SelectRows(Arabic, each ([Arabic] <> "")),
        ROC = Table.SelectColumns(Filter,{"Arabic"})
    in
        ROC
    Last edited by sandy666; Jun 29th, 2019 at 12:19 PM.
    I know you know but I forgot my Crystal Ball and don't know what you know



    In the first post, show the type of machine (PC / Mac) and the Office version you are working on
    impossible things we do on the spot. for miracles you need to wait for a while

  5. #5
    MrExcel MVP
    Join Date
    Apr 2006
    Posts
    19,690
    Post Thanks / Like
    Mentioned
    15 Post(s)
    Tagged
    2 Thread(s)

    Default Re: How to extract Arabic language words from list of different language

    Hi sandy666

    Great to see an M solution. Always good to have different approaches.

    Just a remark

    Quote Originally Posted by AlaaJ View Post
    I have a sheet with a column of list of words in different languages I want to get only Arabic words

    If I understand your code you assumed that the text uses the latin alphabet.
    Maybe the OP has texts written with other alphabets or with ideograms.
    I tried with the more usual ones like Russian and Greek and it did not work.
    Even with the Latin alphabet it will sometimes not work, for ex. try the Polish word może (maybe). The letter ż will not be correctly processed because you assumed only until 300.
    I did not try with Chinese or others but I guess the problem will be the same.

    Maybe the OP will not need it, but I wanted to give you a heads up, you might want to tweak the code.
    Kind regards
    PGC

    To understand recursion, you must understand recursion.

  6. #6
    Board Regular sandy666's Avatar
    Join Date
    Oct 2015
    Posts
    2,820
    Post Thanks / Like
    Mentioned
    32 Post(s)
    Tagged
    1 Thread(s)

    Cool Re: How to extract Arabic language words from list of different language

    in the past I did everything with predicting "what if" but it turned out to be a bad method, so now I only do what I see in the first post, if OP shows a representative example, I will do it, if not I don't.

    all depends on the definition of C2R

    edit:
    and no, there is not Latin only
    Last edited by sandy666; Jun 29th, 2019 at 06:22 PM.
    I know you know but I forgot my Crystal Ball and don't know what you know



    In the first post, show the type of machine (PC / Mac) and the Office version you are working on
    impossible things we do on the spot. for miracles you need to wait for a while

  7. #7
    Board Regular sandy666's Avatar
    Join Date
    Oct 2015
    Posts
    2,820
    Post Thanks / Like
    Mentioned
    32 Post(s)
    Tagged
    1 Thread(s)

    Cool Re: How to extract Arabic language words from list of different language

    but it works with Polish, Russian and Greek text

    Last edited by sandy666; Jun 29th, 2019 at 06:36 PM.
    I know you know but I forgot my Crystal Ball and don't know what you know



    In the first post, show the type of machine (PC / Mac) and the Office version you are working on
    impossible things we do on the spot. for miracles you need to wait for a while

  8. #8
    Board Regular sandy666's Avatar
    Join Date
    Oct 2015
    Posts
    2,820
    Post Thanks / Like
    Mentioned
    32 Post(s)
    Tagged
    1 Thread(s)

    Cool Re: How to extract Arabic language words from list of different language

    Ops! I should say that I have expanded C2R to 1568
    I know you know but I forgot my Crystal Ball and don't know what you know



    In the first post, show the type of machine (PC / Mac) and the Office version you are working on
    impossible things we do on the spot. for miracles you need to wait for a while

  9. #9
    Board Regular sandy666's Avatar
    Join Date
    Oct 2015
    Posts
    2,820
    Post Thanks / Like
    Mentioned
    32 Post(s)
    Tagged
    1 Thread(s)

    Cool Re: How to extract Arabic language words from list of different language

    Just for fun

    I know you know but I forgot my Crystal Ball and don't know what you know



    In the first post, show the type of machine (PC / Mac) and the Office version you are working on
    impossible things we do on the spot. for miracles you need to wait for a while

  10. #10
    MrExcel MVP
    Join Date
    Apr 2006
    Posts
    19,690
    Post Thanks / Like
    Mentioned
    15 Post(s)
    Tagged
    2 Thread(s)

    Default Re: How to extract Arabic language words from list of different language

    Seems great.

    Looking at your code didn't seem that just by expanding C2R to 1568, like you said in post #8 , would be enough to catch languages like Chinese or Japanese.
    I'll give it a go when I have the time.
    Kind regards
    PGC

    To understand recursion, you must understand recursion.

Some videos you may like

User Tag List

Tags for this Thread

Like this thread? Share it with others

Like this thread? Share it with others

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •