Need program/VBA to run that will help me extract portions of a string of text and put it in the next column

Maggie Barr

Board Regular
Joined
Jan 28, 2014
Messages
188
Hello There,
I am in need of a program/VBA to run that will help me extract portions of a string of text and put it in the next column. I regularly have to compare species lists. Plant species, beetles, butterflies etc., and I usually get these lists in a format that contains the authority of the species within the name, so I have, in the past, had to use text to columns and manually reorganize the data so that the authorities are removed in order to use VLOOKUP to merge databases based on the name as the authorities are not always entered the same way. This can take a lot of time. I am in hopes that someone can help me with the task. I am using a PC with Excel 2010.

Below is a sample list of species and how they are often submitted (Given Name) and how I need the format (Returned Name). What I need the code to do is take the first two words in the cell (genus species, space delimited), and then search the rest of the cell for ‘var.’, ‘ssp.’, or both, and return those terms WITH the word following those terms. Sometimes there will be a variety with a subspecies and sometimes a subspecies with a variety; therefore, some species/cells will have both of those terms and need both brought over with the name/word following the term in the correct order. As well there can be hybrids that are reported with × (this is not a normal x). These could be attached to the species name (the second name with no space), or they could be two genus species names with × in between them spaced accordingly. If the × is spaced, I need the two words following it returned with the ×. Note with hybrids there are not always authorities to worry about but sometimes.

This is a complex thing for me to explain, and I hope I have made it relatively clear, but do not hesitate to ask questions.
Thank you so much, in advance, for taking the time to consider the puzzle I have put forth. I am open to any advice.
Best,
Maggie :eek:
Let me illustrate:

GIVEN NAMERETURNED NAME
Abies balsamea var. phanerolepis Fern.Abies balsamea var. phanerolepis
Abutilon abutilon (L.) RusbyAbutilon abutilon
Acer rubrum × A. saccharinumAcer rubrum × A. saccharinum
Achillea millefolium ssp. borealis (Bong.)Achillea millefolium ssp. borealis
Agrostis borealis Hartman ssp. americana (Hartman) TzvelevAgrostis borealis ssp. americana
Alisma plantago-aquatica L. var. americanum J.A. SchultesAlisma plantago-aquatica var. americanum
Acnida tamariscina (Nutt.) Wood var. prostrata Uline & BrayAcnida tamariscina var. prostrata
Asplenium trichomanes ssp. quadrivalens × Asplenium trichomanes ssp. trichomanesAsplenium trichomanes ssp. quadrivalens × Asplenium trichomanes ssp. trichomanes
Aster ×blakei (Porter) HouseAster ×blakei
Carex viridula Michx. ssp. viridula var. viridulaCarex viridula ssp. viridula var. viridula
Salix eriocephala Michx. ssp. eriocephala var. eriocephalaSalix eriocephala ssp. eriocephala var. eriocephala
Solidago simplex Kunth. ssp. randii (Porter) Ringius var. monticola (Porter) RingiusSolidago simplex ssp. randii var. monticola
Symphyotrichum lanceolatum (Willd.) Nesom ssp. lanceolatum var. latifolium (Semple & Chmielewski) NesomSymphyotrichum lanceolatum ssp. lanceolatum var. latifolium
Diphasiastrum alpinum (L.) Holub × D. complanatumDiphasiastrum alpinum × D. complanatum
Echinochloa crus-galli var. frumentacea (Link) W. WightEchinochloa crus-galli var. frumentacea
Polygonum arifolium L. var. lentiforme Fern. & Grisc.Polygonum arifolium var. lentiforme
Potamogeton perfoliatus × P. pusillus ssp. tenuissimusPotamogeton perfoliatus × P. pusillus ssp. tenuissimus

<tbody>
</tbody>

<tbody>
</tbody>
 
Last edited:

Excel Facts

Workdays for a market open Mon, Wed, Friday?
Yes! Use "0101011" for the weekend argument in NETWORKDAYS.INTL or WORKDAY.INTL. The 7 digits start on Monday. 1 means it is a weekend.
What you are asking for seems difficult. Let me start by using this example text.
example (delete) del second
You want the result of this text to be
example second
That is easy. Use the REPLACE function in VBA to replace the open parenthesis and close parenthsis. Turn those parenthesis into the same character such as a slash "/"
The string will read example /delete/ del second
then use the SPLIT function to split the text. The output will then be
example
delete
del second
Then tell vba to split the third variable in the split. The third variable is del second
Have vba enter a loop to merge all the splits together into one variable. Exclude the splits that you don't want.

Now here is what I won't be able to help with. Some of your examples in your dataset exclude more than 1 word from the original string after the parenthesis. for example
example (delete) del no second
and you want to delete what is in the parenthesis and the del no part.
the output should be
example second
This is a problem because how it vba suppose to know when to exclude only one word after the parenthesis and when to exclude two words.? Without this information, no one will be able to help you.
 
Upvote 0
Thank you for the response WP,
I have to admit I found what you were saying somewhat confusing, but I will do my best to clarify. I think part of your question is that the () in the cells are not part of any of the code needed. The () in the cell is sort of irrelevant because some times the authority names of a species will have a parentheses around it, sometimes not, but they are never part of the species name. However, if because of their presence VBA will glitch I can, as you said, replace them with something else. The other part that you spoke of sounds like it is the inconsistency of location of the words I want, "Now here is what I won't be able to help with. Some of your examples in your data set exclude more than 1 word from the original string after the parenthesis. for example...". See what I need is a search through a variable length string of text for the ssp. and var. within the cells returning, as I said, those terms and the subsequent/following word (remembering that some cells have both and both need to be reported). I hope these comments help clarify. I knew this was not an easy request, hence why I have done this manually for years, but I have had some luck lately getting help online, so I thought I would try.
I appreciate your interest and time.
Maggie
 
Upvote 0
OMG I just spent about 30 minutes writing a message to you about how to solve this. I was just about finished and my browser crashed. That sucks. The message contained an explination of each line of code. I am not doing that again. I'm just going to write the code and you'll have to fixure it out.
 
Upvote 0
Step 1: Find out how many rows there are in your column we are evaluating. I'm assuming that your dataset is located in column A and begins with row 1.
Code:
lastRow = Range("A" & Rows.Count).Row
Step 2: Create a Do Until loop that will continue to run the code until it finishes evaluating the last row in your dataset. If your dataset begins in A1 then i = 1. If your dataset has headers, then i = 2
Code:
i = 1
Do Until i =lastRow
'enter code here
i = i + 1
Loop
Step 3: Split the text of the currently evaluating cell, and use a space as the delimiter.
Code:
originalText = Range("A" & i).value
mySplit = SPLIT(originalText, " ")
Step 4: Count how many objects are in the array "mySplit". In your text value "Abies balsamea var. phanerolepis Fern." there are 5 objects because there are 5 words. You'll have to research how to do this because I forgot.
Code:
mySplitMax = 'the code to find how many objects there are in the array
Step 5: I noticed from your dataset that the first word of the original text is always part of your output string. This is good because you capitalized the first letter of the first word and I'm going to be using code that does not include words with capital letters unless it is the first word in the text. Insert the first word of the original text into the output string.
Code:
outputString = mySplit(0)
Step 6: Create a Do Until loop that will evaluate each word in the text value.
Code:
s = 1
Do Until s > mySplitMax
'enter code here
s = s + 1
Loop
Step 7: Create an if statement that will check if the word has an open parenthesis.
Code:
output = TRUE
openParenthesis = LEFT(mySplit(s), 1)
If openParenthesis = "(" Then
output = FALSE
End If
Step 8: Create an if statement that will check if the word begins with a capital letter.
Code:
If output = TRUE Then
capitalLetter = LCASE(mySplit(s))
[/COLOR]If capitalLetter <> mySplit(s) Then
output = FALSE
End If
Step 9: Create an if statement that adds the word to the output string.
If output = TRUE Then
outputString = outputString & " " & mySplit(s)
End If
Step 10: Create an if statement that will add the next word if the word that was just added in the last if statement was an x. Look at this example from your dataset. "Potamogeton perfoliatus × P. pusillus ssp. tenuissimus" . So far, the x has been added to the output string, but the P will not be added because it is a capital letter. So we need to override it here.
Code:
countLen = LEN(mySplit(s))
If countLen = 1 AND mySplit(s) = "x" Then
outputString = outputString & " " & mySplit(s + 1)
End If

I think that covers everything. I did this all from memory, so I hope I didn't give you code with syntax errors. Also, the last IF statement I gave you, I am assuming that "x" is the letter x. If it is not, then I don't know how to override it. You'll have to ask someone else how to search for that character and apply it to that if statement to work.
 
Upvote 0
I forgot to add code at the bottom of the last if statement to add the output string to another cell.
Code:
Range("B" & i).value = outputString
Yeah I know, it's easy and common sense.
 
Upvote 0
I always reread what I write to look for errors and I found one. In step 2, this should be the code.

Code:
i = 1
Do Until i > lastRow
'enter code here
i = i + 1
Loop
 
Upvote 0
I found another error in the code in step 8. The code should be this.
Code:
[COLOR=#333333][COLOR=#333333]
If output = TRUE Then
capitalLetter = LCASE(mySplit(s))[/COLOR][/COLOR][COLOR=#333333]
If capitalLetter <> mySplit(s) Then
output = FALSE
End If
End If[/COLOR]
 
Upvote 0
In step 9, I forgot to put the code in code format.
Code:
[COLOR=#333333]If output = TRUE Then[/COLOR]
[COLOR=#333333]outputString = outputString & " " & mySplit(s)[/COLOR]
[COLOR=#333333]End If[/COLOR]
 
Upvote 0
Try this:-
Results column "B".
Code:
[COLOR="Navy"]Sub[/COLOR] MG10Sep23
[COLOR="Navy"]Dim[/COLOR] Rng             [COLOR="Navy"]As[/COLOR] Range
[COLOR="Navy"]Dim[/COLOR] Dn              [COLOR="Navy"]As[/COLOR] Range
[COLOR="Navy"]Dim[/COLOR] Txt             [COLOR="Navy"]As[/COLOR] Variant
[COLOR="Navy"]Dim[/COLOR] Str             [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]String[/COLOR]
[COLOR="Navy"]Dim[/COLOR] n               [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long[/COLOR]
[COLOR="Navy"]Set[/COLOR] Rng = Range(Range("A2"), Range("A" & Rows.Count).End(xlUp))
[COLOR="Navy"]For[/COLOR] [COLOR="Navy"]Each[/COLOR] Dn [COLOR="Navy"]In[/COLOR] Rng
    Txt = Split(Dn.Value, " ")
    Str = Str & Txt(0) & " " & Txt(1)
    [COLOR="Navy"]For[/COLOR] n = 2 To UBound(Txt)
        [COLOR="Navy"]Select[/COLOR] [COLOR="Navy"]Case[/COLOR] Txt(n)
            [COLOR="Navy"]Case[/COLOR] Chr(215): Str = Str & " " & Txt(n) & " " & Txt(n + 1)
            [COLOR="Navy"]Case[/COLOR] Left(Txt(n), 2 = 1) = Chr(215): Str = Str & " " & Txt(n)
            [COLOR="Navy"]Case[/COLOR] "var.": Str = Str & " " & Txt(n) & " " & Txt(n + 1)
            [COLOR="Navy"]Case[/COLOR] "ssp.": Str = Str & " " & Txt(n) & " " & Txt(n + 1)
            [COLOR="Navy"]Case[/COLOR] "A.": Str = Str & " " & Txt(n + 1)
            [COLOR="Navy"]Case[/COLOR] "D.": Str = Str & " " & Txt(n + 1)
        [COLOR="Navy"]End[/COLOR] Select
    [COLOR="Navy"]Next[/COLOR] n
Dn.Offset(, 1).Value = Str: Str = ""
[COLOR="Navy"]Next[/COLOR] Dn
[COLOR="Navy"]End[/COLOR] [COLOR="Navy"]Sub[/COLOR]
Regards Mick
 
Upvote 0

Forum statistics

Threads
1,214,386
Messages
6,119,214
Members
448,874
Latest member
b1step2far

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top