Extract data from Strings/Output Uniqueness

segran

Active Member
Joined
Aug 20, 2004
Messages
335
Hi,

Example:

<html><head><title>Excel Jeanie HTML</title></head><body>

<!-- ######### Start Created Html Code To Copy ########## -->

Excel Workbook
BC
3DataOutput
4sadckads cndshc shell samdhshell
5dhywqio shell nsdkhskjshell
6apple jschkjd bsfvapple
7gdhj765 sjsdh apple shcjkhdapple
Sheet1




<!-- ######### End Created Html Code To Copy ########## -->

</body></html

I In column 1 (Data), I want to compare each string to each other from start to end of the dataset in the column, and it then must find a pattern and output the result in column 2 (Output).

Can this be done?

Thank you
 

Excel Facts

Bring active cell back into view
Start at A1 and select to A9999 while writing a formula, you can't see A1 anymore. Press Ctrl+Backspace to bring active cell into view.
I would think so, but we need some rules we can count on. Is the data actually like you have in the example? That is, will it always be just two-cell groups? (ie, we are never comparing three cells)
 
Upvote 0
Hi,

Yes it is like your thinking.

I do hope you can assist.

Just to clarify, I want all cells in DATA column to be compared to each cell in DATA column, find a pattern.

Once the pattern is determined. it must output the result in Output column (column 2) for each corresponding cell in DATA column.

Thank you
 
Upvote 0
Okay, I wasn't sure, so tried to allow for a variable number of cells, as well as if there are two or more matches.

For data like:
Excel Workbook
AB
1DataOutput
2sadckads cndshc shell samdhshell
3dhywqio shell nsdkhskj
4apple jschkjd bsfvapple
5gdhj765 sjsdh shcjkhd apple
6123 Bear 456 ChickenBear, Chicken
7asd Chicken asd Bear
8a goat was heregoat
9but the goat left
10where did the goat go?
11one two 
12three four
Sheet1



In a Standard Module:

Rich (BB code):
Option Explicit
    
Function RetDupVal(ByVal CellRng As Range) As String
Static REX  As Object ' RegExp
Dim DIC     As Object ' Dictionary
Dim rexMC   As Object ' MatchCollection
Dim rexM    As Object ' Match
Dim n       As Long
Dim i       As Long
Dim y       As Long
Dim aryTmp  As Variant
Dim ColNest As Variant
Dim Cell    As Range
    
    Set DIC = CreateObject("Scripting.Dictionary")
    If REX Is Nothing Then Set REX = CreateObject("VBScript.RegExp")
    With REX
        .Global = True
        .Pattern = "\b[a-zA-Z]+?\b"
        For Each Cell In CellRng
            If .test(Cell.Value) Then
                Set DIC.Item(Cell.Address) = CreateObject("Scripting.Dictionary")
                Set rexMC = .Execute(Cell.Value)
                For Each rexM In rexMC
                    DIC.Item(Cell.Address).Item(rexM.Value) = Empty
                Next
            End If
        Next
    End With
        
    ColNest = DIC.Items
    For n = LBound(ColNest, 1) To UBound(ColNest, 1)
        aryTmp = ColNest(n).Keys
        For i = LBound(aryTmp, 1) To UBound(aryTmp, 1)
            For y = LBound(ColNest, 1) To UBound(ColNest, 1)
                If Not y = n Then
                    If Not ColNest(y).Exists(aryTmp(i)) Then
                        ColNest(n).Remove aryTmp(i)
                        Exit For
                    End If
                End If
            Next
        Next
    Next
    aryTmp = ColNest(LBound(ColNest, 1)).Keys
    RetDupVal = Join(aryTmp, ", ")
End Function

Hope that helps,

Mark
 
Upvote 0
Hi GTO,

Thank you for the script.

Does it compare for example:

1. cell A2 of the DATA to the entire dataset (in your example A2:A12).
2. cell A3 of DATA will be compared to dataset including A2 and A3:A12.

Thank you.
 
Upvote 0
Hi All,

I was wondering if someone can please help with this request.

Many thanks to GTO for his valuable assistance to date, but it sadly does not help my cause. Many other suggestions with be appreciated.

Thank you
 
Upvote 0
...That is, will it always be just two-cell groups? (ie, we are never comparing three cells)

Hi,

Yes it is like your thinking. ...

Based on your initial sample data and response, I am afraid I misunderstood. You did add " I want all cells in DATA column to be compared to each cell in DATA column, find a pattern."

Hopefully, I have a better grasp on that.

For:
Excel Workbook
AB
1DataOutput
2123 Bear 456 ChickenBear, Chicken
3dhywqio shell nsdkhskjshell
4apple jschkjd bsfvapple
5one two 
6but the goat leftgoat, the
7asd Chicken asd BearChicken, Bear
8a goat was heregoat
9three four 
10where did the goat go?the, goat
11gdhj765 sjsdh shcjkhd appleapple
12sadckads cndshc shell samdhshell
Sheet2


A none-to-sure maybe:
Rich (BB code):
Option Explicit
    
Function RETANYDUP(ByVal CellRng As Range, ByVal CellOne As Range) As String
Static REX          As Object ' RegExp
Dim DIC             As Object ' Dictionary
Dim rexMC           As Object ' MatchCollection
Dim rexM            As Object ' Match
Dim n               As Long
Dim i               As Long
Dim y               As Long
Dim aryTmp          As Variant
Dim ColNest         As Variant
Dim ColAddresses    As Variant
Dim bolCellOneComp  As Boolean
Dim Cell            As Range
    
    Set DIC = CreateObject("Scripting.Dictionary")
    If REX Is Nothing Then Set REX = CreateObject("VBScript.RegExp")
    With REX
        .Global = True
        .Pattern = "\b[a-zA-Z]+?\b"
        For Each Cell In CellRng
            If .test(Cell.Value) Then
                Set DIC.Item(Cell.Address) = CreateObject("Scripting.Dictionary")
                Set rexMC = .Execute(Cell.Value)
                For Each rexM In rexMC
                    DIC.Item(Cell.Address).Item(rexM.Value) = Empty
                Next
            End If
        Next
    End With
    
    ColNest = DIC.Items
    ColAddresses = DIC.Keys
    DIC.RemoveAll
    
    For n = LBound(ColNest, 1) To UBound(ColNest, 1)
        If CellOne.Address = ColAddresses(n) Then
            bolCellOneComp = True
            aryTmp = ColNest(n).Keys
            For y = LBound(ColNest, 1) To UBound(ColNest, 1)
                If Not y = n Then
                    For i = LBound(aryTmp, 1) To UBound(aryTmp, 1)
                        If ColNest(y).Exists(aryTmp(i)) Then
                            DIC.Item(aryTmp(i)) = Empty
                            'Exit For
                        End If
                    Next
                End If
            Next
        End If
        If bolCellOneComp Then Exit For
    Next
    RETANYDUP = Join(DIC.Keys, ", ")
End Function

If we truly are looking through every 'word' in every other cell, while I am certain someone can show improvements (and I hope that comes to pass), I cannot imagine that this will not get awfully 'heavy' if the range is of any size.

Mark
 
Upvote 0
Hi GTO,

Thank you for your assistance. It works on my mock data - I hope it works on the read data set.

Thank you :)
 
Upvote 0
Hi,

<html><head><title>Excel Jeanie HTML</title></head><body>

<!-- ######### Start Created Html Code To Copy ########## -->

Excel Workbook
V
49DORF|I|DORF K C P L | DORF|N|DORF KETAL CHEM I PRIVATE LTD | DORF|N|DORF KETAL CHEM I PVT LTD
50SHEL|C|SHELL INT RES MIJ BV | SHEL|C|SHELL OIL CO | BHAN|I|BHAN O K | WELL|I|WELLINGTON S L
51SHEL|C|SHELL INT RES MIJ BV | SHEL|C|SHELL OIL CO | BHAN|I|BHAN O K | WELL|I|WELLINGTON S L
52PULL|C|KELLOGG LTD M W | SIMO|I|SIMONS H
53BRPE|C|BP EXPLORATION OPERATING CO LTD
54CALI|C|CHEVRON USA INC | TEXC|C|TEXACO INC
55CALI|C|CHEVRON USA INC | TEXC|C|TEXACO INC
56FARK|I|FARKAS G
57ESSO|C|EXXONMOBIL RES & ENG CO | ESSO|C|EXXONMOBIL RES&ENG CO | HOUZ|I|HOU Z | LEWI|I|LEWIS E | MIZA|I|MIZAN T
58PULL|C|KELLOGG BROWN&ROOT LLC | FLOY|I|FLOYD R | SUBR|I|SUBRAMANIAN A
59HEAD|N|HEADWATERS HEAVY OIL LLC | HEAD|N|HEADWATERS TECHNOLOGY INNOVATION LLC
yyy




<!-- ######### End Created Html Code To Copy ########## -->

</body></html

It does not work with my data. Please help.

It is seems to be after YYYY|XX|data i want is here
example:

DORF|I|DORF K C P L
SHEL|C|SHELL INT RES MIJ BV
 
Upvote 0
Hi GTO,

I hope you well.

Example:

CHENG S
CANEBA G
BERG M C
KOKAYEFF P
DORF K C P L

Using the UDF for the above selection, yield the following:


S
G
M C
P
K C P DORF L

Please help in correcting the code. I would like the following results:

CHENG S
CANEBA G
BERG M C
KOKAYEFF P
DORF K C P L

Thank you.
 
Upvote 0

Forum statistics

Threads
1,224,514
Messages
6,179,223
Members
452,896
Latest member
IGT

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top