Parsing Text in Excel - Is it possible?

raghuram.star

Board Regular
Joined
Sep 5, 2012
Messages
102
Dear sir/madam

I'm wondering is there any way to parse the text in Excel. In a tedious process, this problem is one which is consuming most of the time. I read PDF Form's to Excel files and then process it.

Problem: I'm reading a Text Field from PDF Form using VBA, to an excel cell, where the data from text field looks like this

Code:
 Difference listing, ValDoc report, Latency Report
SCD S641Z001SH40 REV  -2
SRDD_6_2_8_8_1_AHM_ITE_WRN_COND.DOC; GEN 97
SCD_6_2_8_8_2_1_ITE_WARN.DOC; GEN 57
HRX_6_2_8_8_3_1_ITE_WARN_OUTPUTS.DOC ; GEN 16
SRDD_3_4_9_WHL_RIG_TEST.DOC- GEN 19
SRDD_3_8_6_7.DOC- GEN 115
BDS_3_4_3_4_RIGFLT_CSOL.DOC- GEN 71
SRDD_3_4_38_EMCU_RIG_INTERFACE.DOC GEN 57
KRI_3_4_40_COMMON_MAINT.DOC GEN 77
SRDD_3_4_2_4_PFC_ITE_START_TEST.DOC/ GEN 56
TOOL_3_4_60_ACTUATOR_COMMAND_SWITCH.DOC/ GEN 134
PF_3_10_01_RIGGING.DOC GEN 152
SRS_04_01_03.DOC GEN 55
SRDD_3_4_2_6_PFC_ITE_SWITCH.DOC GEN 110
AF_3_4_2_5_PF_ITE_DISPLAYS.DOC GEN 19
SRDD_3_6_2_2_11_PF_OP_BMP.DOC GEN 199
SRDD_3_6_1_6_PF_INPUTS_NVM.DOC GEN 167
GHE_3_8_3_25.DOC GEN 13
SRDD_3_8_6_7_4.DOC GEN 21

This file is for minor TPP-GTH blueline updates. RFS was performed under DD SRF 8568.35
Till now, what I'm doing is, manually copying the required text field and then processing it. On an average I process 70 - 100 files a day.
It's really pain full doing the same thing whole day.

What I look in text is, There will be Some File Name (SRDD_6_2_8_8_1_AHM_ITE_WRN_COND.DOC; ), At the END which will have a WORD "GEN" followed with some number (GEN 97)

Flaws:
1) No standard naming convention,
2) No standard file extention, (In this case all are ".DOC" files, it could be ".H ", " . CD " or any thing for that matter)
3) No limit for number of files (In this case it has 18 files, some time it has 50 to 60 files or even more)

It would be great, if someone knows, if there is a way to automate the process

This is what finally I take up on from the text field

Code:
SRDD_6_2_8_8_1_AHM_ITE_WRN_COND.DOC/GEN=97
SCD_6_2_8_8_2_1_ITE_WARN.DOC/GEN=57
HRX_6_2_8_8_3_1_ITE_WARN_OUTPUTS.DOC/GEN=16
SRDD_3_4_9_WHL_RIG_TEST.DOC/GEN=19
SRDD_3_8_6_7.DOC/GEN=115
BDS_3_4_3_4_RIGFLT_CSOL.DOC/GEN=71
SRDD_3_4_38_EMCU_RIG_INTERFACE.DOC/GEN=57
KRI_3_4_40_COMMON_MAINT.DOC/GEN=77
SRDD_3_4_2_4_PFC_ITE_START_TEST.DOC/GEN=56
TOOL_3_4_60_ACTUATOR_COMMAND_SWITCH.DOC/GEN=134
PF_3_10_01_RIGGING.DOC/GEN=152
SRS_04_01_03.DOC/GEN=55
SRDD_3_4_2_6_PFC_ITE_SWITCH.DOC/GEN=110
AF_3_4_2_5_PF_ITE_DISPLAYS.DOC/GEN=19
SRDD_3_6_2_2_11_PF_OP_BMP.DOC/GEN=199
SRDD_3_6_1_6_PF_INPUTS_NVM.DOC/GEN=167
GHE_3_8_3_25.DOC/GEN=13
SRDD_3_8_6_7_4.DOC/GEN=21
If there is some or the other way to do it, please let me know. please sir please.... Your help would lot my life on repeated tasks.

Please let me know if you need more inputs.

Thanks a million in advance
 
Last edited:

Some videos you may like

Excel Facts

Format cells as date
Select range and press Ctrl+Shift+3 to format cells as date. (Shift 3 is the # sign which sort of looks like a small calendar).

raghuram.star

Board Regular
Joined
Sep 5, 2012
Messages
102
Forgot to say one more thing, the whole text is contained in Single Cell only (Not in multiple cells), which is continuous in nature
 

pjmorris

Well-known Member
Joined
Aug 2, 2012
Messages
1,838
Office Version
2016
Platform
Windows
Hi raghuram

I think this needs a User Defined Function to sort it out - the difficult bit is getting rid of the unwanted characters between the file extension and the GEN. I'm working on it.

Do you know how to enter a UDF using the Visual Basic Editor? if not I'll include some advice.

Regars
 

raghuram.star

Board Regular
Joined
Sep 5, 2012
Messages
102
Hi raghuram

I think this needs a User Defined Function to sort it out - the difficult bit is getting rid of the unwanted characters between the file extension and the GEN. I'm working on it.

Do you know how to enter a UDF using the Visual Basic Editor? if not I'll include some advice.

Regars
Thank you PJMorris for taking up the initiative. I don't know about UDF, I know only working with Modules and Sheet Modules.
 

pjmorris

Well-known Member
Joined
Aug 2, 2012
Messages
1,838
Office Version
2016
Platform
Windows
Hi Raghuram,

Excellent if you know how to access a Module.

Open a module and then copy and paste the following code:

Code:
Function parseFile(srce As String) As String
    Dim nn As Integer
    Dim mm As Integer
    
    nn = InStr(srce, ".")
    mm = nn + 1
    While UCase(Mid(srce, mm, 1)) Like "[A-Z]"
        mm = mm + 1
    Wend
    
    parseFile = Left(srce, mm - 1) & "/G" & Right(srce, Len(srce) - InStr(srce, "GEN"))
End Function
Then assuming your source data is in cell A1, in cell B1: =parseFile(A1)

then copy this file down your list.

Grateful if you could let me know if this works.

Regards
 

raghuram.star

Board Regular
Joined
Sep 5, 2012
Messages
102
Hi Raghuram,

Excellent if you know how to access a Module.

Open a module and then copy and paste the following code:

Code:
Function parseFile(srce As String) As String
    Dim nn As Integer
    Dim mm As Integer
    
    nn = InStr(srce, ".")
    mm = nn + 1
    While UCase(Mid(srce, mm, 1)) Like "[A-Z]"
        mm = mm + 1
    Wend
    
    parseFile = Left(srce, mm - 1) & "/G" & Right(srce, Len(srce) - InStr(srce, "GEN"))
End Function
Then assuming your source data is in cell A1, in cell B1: =parseFile(A1)

then copy this file down your list.

Grateful if you could let me know if this works.

Regards
Hi PJMorris

Sorry this one did not worked, no change in the output cell. Both cells A1 & A2 are same (I have placed whole text content in " A1 " cell and formula " =parsefile(A1) " in " A2 " cell)

I think you missed this, The below complete text will be in Single cell only
Difference listing, ValDoc report, Latency Report
SCD S641Z001SH40 REV -2
SRDD_6_2_8_8_1_AHM_ITE_WRN_COND.DOC; GEN 97
SCD_6_2_8_8_2_1_ITE_WARN.DOC; GEN 57
HRX_6_2_8_8_3_1_ITE_WARN_OUTPUTS.DOC ; GEN 16
SRDD_3_4_9_WHL_RIG_TEST.DOC- GEN 19
SRDD_3_8_6_7.DOC- GEN 115
BDS_3_4_3_4_RIGFLT_CSOL.DOC- GEN 71
SRDD_3_4_38_EMCU_RIG_INTERFACE.DOC GEN 57
KRI_3_4_40_COMMON_MAINT.DOC GEN 77
SRDD_3_4_2_4_PFC_ITE_START_TEST.DOC/ GEN 56
TOOL_3_4_60_ACTUATOR_COMMAND_SWITCH.DOC/ GEN 134
PF_3_10_01_RIGGING.DOC GEN 152
SRS_04_01_03.DOC GEN 55
SRDD_3_4_2_6_PFC_ITE_SWITCH.DOC GEN 110
AF_3_4_2_5_PF_ITE_DISPLAYS.DOC GEN 19
SRDD_3_6_2_2_11_PF_OP_BMP.DOC GEN 199
SRDD_3_6_1_6_PF_INPUTS_NVM.DOC GEN 167
GHE_3_8_3_25.DOC GEN 13
SRDD_3_8_6_7_4.DOC GEN 21

This file is for minor TPP-GTH blueline updates. RFS was performed under DD SRF 8568.35
 

pjmorris

Well-known Member
Joined
Aug 2, 2012
Messages
1,838
Office Version
2016
Platform
Windows
Your right I did miss that subtle point!

From your data it looks as if the file name only consists of characters, numbers and underscores? I know there may be other characters in the string, but I'm specifically referring to the filename as that is the basis on which I'm working, but grateful for confirmation.

Regards
 

pjmorris

Well-known Member
Joined
Aug 2, 2012
Messages
1,838
Office Version
2016
Platform
Windows
Hi Raghuram,

This might solve the problem:

Assuming your long text is in cell A1.
1. In cell B1: =substitute(A1, char(10), "¬")
2. This will replace all the linefeed codes with the symbol '¬', which is unlikely to be found in the text. You may need to replace char(10) with char(13) as I'm not sure which is the CR/LF code. If this doesn't work please send me an exact copy of the string in a private message.
3. Then copy and PasteSpecial/Value cell B1 to a blank cell with plenty of blank columns to the right of it (enough for all the file names), it needs to be a PasteSpecial/Value as it is the text we need to work with.
4. Click in the cell with the new string and choose 'Text To Columns' from the Data ribbon. Choose 'Delimited', click next, in the list of delimiters make sure than only 'Other' is selected and then type ¬ into the box, click Finish as you don't need the next page of choices.
5. You have now split the single text string into separate elements. Select all of them and copy them. Select a blank cell that has sufficient room beneath it to take all the entries and then PasteSpecial/Transpose.
6. In the cell to the right of the top of the data enter the formula =parseFile(A1) and copy down as necessary.

In summary, I'm using substitute to replace the line feeds at the end of each row (as you've displayed them) with a character that the Text to Columns wizard can understand. I then copy the output of the wizard to a column and use the parsefile() function to tidy them up.

Hope this goes some way to solving the problem.

Regards
 

RoryA

MrExcel MVP, Moderator
Joined
May 2, 2008
Messages
35,025
Office Version
365, 2019, 2016, 2010
Platform
Windows, MacOS
It appears this has been solved here.

@raghuram.star
Please note the forum rules on cross-posting and add links to any cross-posts in future. Thank you. :)
 

pjmorris

Well-known Member
Joined
Aug 2, 2012
Messages
1,838
Office Version
2016
Platform
Windows
RoryA

You advice about not cross-posting is double edged - with a stroke you dismiss the effort I've expended to try and solve raghuram's issue, but since its on a different board I can't see the solution without creating another login.

That said - it is exactly the same question so why has Raghuram been wasting my time!
 

Watch MrExcel Video

Forum statistics

Threads
1,099,626
Messages
5,469,785
Members
406,670
Latest member
Jimborusk13

This Week's Hot Topics

Top