Parsing Text in Excel - Is it possible?

raghuram.star

Board Regular
Joined
Sep 5, 2012
Messages
102
Dear sir/madam

I'm wondering is there any way to parse the text in Excel. In a tedious process, this problem is one which is consuming most of the time. I read PDF Form's to Excel files and then process it.

Problem: I'm reading a Text Field from PDF Form using VBA, to an excel cell, where the data from text field looks like this

Code:
 Difference listing, ValDoc report, Latency Report
SCD S641Z001SH40 REV  -2
SRDD_6_2_8_8_1_AHM_ITE_WRN_COND.DOC; GEN 97
SCD_6_2_8_8_2_1_ITE_WARN.DOC; GEN 57
HRX_6_2_8_8_3_1_ITE_WARN_OUTPUTS.DOC ; GEN 16
SRDD_3_4_9_WHL_RIG_TEST.DOC- GEN 19
SRDD_3_8_6_7.DOC- GEN 115
BDS_3_4_3_4_RIGFLT_CSOL.DOC- GEN 71
SRDD_3_4_38_EMCU_RIG_INTERFACE.DOC GEN 57
KRI_3_4_40_COMMON_MAINT.DOC GEN 77
SRDD_3_4_2_4_PFC_ITE_START_TEST.DOC/ GEN 56
TOOL_3_4_60_ACTUATOR_COMMAND_SWITCH.DOC/ GEN 134
PF_3_10_01_RIGGING.DOC GEN 152
SRS_04_01_03.DOC GEN 55
SRDD_3_4_2_6_PFC_ITE_SWITCH.DOC GEN 110
AF_3_4_2_5_PF_ITE_DISPLAYS.DOC GEN 19
SRDD_3_6_2_2_11_PF_OP_BMP.DOC GEN 199
SRDD_3_6_1_6_PF_INPUTS_NVM.DOC GEN 167
GHE_3_8_3_25.DOC GEN 13
SRDD_3_8_6_7_4.DOC GEN 21

This file is for minor TPP-GTH blueline updates. RFS was performed under DD SRF 8568.35

Till now, what I'm doing is, manually copying the required text field and then processing it. On an average I process 70 - 100 files a day.
It's really pain full doing the same thing whole day.

What I look in text is, There will be Some File Name (SRDD_6_2_8_8_1_AHM_ITE_WRN_COND.DOC; ), At the END which will have a WORD "GEN" followed with some number (GEN 97)

Flaws:
1) No standard naming convention,
2) No standard file extention, (In this case all are ".DOC" files, it could be ".H ", " . CD " or any thing for that matter)
3) No limit for number of files (In this case it has 18 files, some time it has 50 to 60 files or even more)

It would be great, if someone knows, if there is a way to automate the process

This is what finally I take up on from the text field

Code:
SRDD_6_2_8_8_1_AHM_ITE_WRN_COND.DOC/GEN=97
SCD_6_2_8_8_2_1_ITE_WARN.DOC/GEN=57
HRX_6_2_8_8_3_1_ITE_WARN_OUTPUTS.DOC/GEN=16
SRDD_3_4_9_WHL_RIG_TEST.DOC/GEN=19
SRDD_3_8_6_7.DOC/GEN=115
BDS_3_4_3_4_RIGFLT_CSOL.DOC/GEN=71
SRDD_3_4_38_EMCU_RIG_INTERFACE.DOC/GEN=57
KRI_3_4_40_COMMON_MAINT.DOC/GEN=77
SRDD_3_4_2_4_PFC_ITE_START_TEST.DOC/GEN=56
TOOL_3_4_60_ACTUATOR_COMMAND_SWITCH.DOC/GEN=134
PF_3_10_01_RIGGING.DOC/GEN=152
SRS_04_01_03.DOC/GEN=55
SRDD_3_4_2_6_PFC_ITE_SWITCH.DOC/GEN=110
AF_3_4_2_5_PF_ITE_DISPLAYS.DOC/GEN=19
SRDD_3_6_2_2_11_PF_OP_BMP.DOC/GEN=199
SRDD_3_6_1_6_PF_INPUTS_NVM.DOC/GEN=167
GHE_3_8_3_25.DOC/GEN=13
SRDD_3_8_6_7_4.DOC/GEN=21

If there is some or the other way to do it, please let me know. please sir please.... Your help would lot my life on repeated tasks.

Please let me know if you need more inputs.

Thanks a million in advance
 
Last edited:
I don't follow you - I did not give advice about not cross-posting, nor did I dismiss anything.
 
Upvote 0

Excel Facts

Copy PDF to Excel
Select data in PDF. Paste to Microsoft Word. Copy from Word and paste to Excel.
It appears this has been solved here.

@raghuram.star
Please note the forum rules on cross-posting and add links to any cross-posts in future. Thank you. :)

@Rory,

****! Why didn't I refresh this thread so that I could have seen your message.


@raghuram.star

You should definite alert the forums you post to that you have posted the question elsewhere and include links to those other threads. Anyway, since I developed it, I'll post it... here is my solution to your question (assumes your text is in A1 and outputs to B1):
Code:
Sub LatencyReport()
  Dim X As Long, Latency As String, GEN() As String
  Latency = Replace(Range("A1"), vbCr, "")
  Latency = Mid(Latency, InStr(InStr(Latency, vbLf) + 2, Latency, vbLf) + 1)
  Latency = Replace(Latency, " GEN ", "/GEN=")
  GEN = Split(Latency, "/GEN=")
  For X = 0 To UBound(GEN) - 1
    Do While Right(GEN(X), 1) Like "[!A-Za-z]"
      GEN(X) = Left(GEN(X), Len(GEN(X)) - 1)
    Loop
  Next
  Range("B1") = Join(GEN, "/GEN=")
  Range("B1").WrapText = True
End Sub
 
Upvote 0
How is that either dismissive or advising not to cross post? That's what I don't get.
 
Upvote 0
Raghuram, Rick

Following Rick's repost of his previous advice I realise that I'm not clear about the required output - I'd assumed that each filename needed to be in a separate cell rather than just remove the unnecessary characters. I can see how your solution works Rick, but its now time to go home for the weekend, so I offer my solutio, which appears somewhat inelegant, but extracts each into a separate row:

Code:
Sub parseFile2()
    Dim nn As Integer
    Dim mm As Integer
    Dim outputRow As Integer
    Dim outputCol As Integer
    Dim srce As String
    
    outputRow = ActiveCell.Row ' used to send the output to the rows immediately to the right and below the active cell
    outputCol = ActiveCell.Column + 1 'used to send the output to the column to the right of the active cell.
    srce = UCase(ActiveCell)
    
    For nn = 1 To Len(srce)
        If Mid(srce, nn, 1) = " " Or Mid(srce, nn, 1) = Chr(10) Then
            mm = nn + 1
            
            While Mid(srce, mm, 1) Like "[.0-9A-Z_]"
                If Mid(srce, mm, 1) = "." Then
                    mm = mm + 1
                    While Mid(srce, mm, 1) Like "[A-Z]"
                        mm = mm + 1
                    Wend
                    If Val(Mid(srce, mm + 3 + InStr(Mid(srce, mm), "GEN "))) <> 0 Then
                        Cells(outputRow, outputCol) = Mid(srce, nn + 1, mm - nn - 1) & "/GEN=" & Trim(Val(Mid(srce, mm + 3 + InStr(Mid(srce, mm), "GEN "))))
                        outputRow = outputRow + 1
                    End If
                End If
                mm = mm + 1
            Wend
        End If
    Next
End Sub

Have a great weekend.

Regards
 
Upvote 0
I'd assumed that each filename needed to be in a separate cell rather than just remove the unnecessary characters.
And you have assumed correctly... I just ran the "solution" from the other thread and that is what it did. So, for the sake of completeness, here is my code modified to do that...
Code:
Sub LatencyReport()
  Dim X As Long, Latency As String, GEN() As String
  Latency = Replace(Range("A1"), vbCr, "")
  Latency = Mid(Latency, InStr(InStr(Latency, vbLf) + 2, Latency, vbLf) + 1)
  Latency = Replace(Latency, " GEN ", "/GEN=")
  GEN = Split(Latency, "/GEN=")
  For X = 0 To UBound(GEN) - 1
    Do While Right(GEN(X), 1) Like "[!A-Za-z]"
      GEN(X) = Left(GEN(X), Len(GEN(X)) - 1)
    Loop
  Next
  Latency = Join(GEN, "/GEN=")
  Range("B1").Resize(UBound(GEN)) = Application.Transpose(Split(Latency, vbLf))
End Sub
 
Upvote 0
It appears this has been solved here.

@raghuram.star
Please note the forum rules on cross-posting and add links to any cross-posts in future. Thank you. :)

I'm Sorry Rory A, I new to forums, one of my friend introduced me to these forums where I can get Expert advice and help. so i posted in these forums.

As I visit this forum regularly, In case I post any question in future, where can I mention the link.

I really sorry, this mistake was done in ignorance. Please excuse me. I will not repeat this again.
 
Upvote 0
Thank you PJMorris, you are such a great help. and I'm sorry for not mentioning about post in other forum. Please excuse me, i have done it without knowing.

Thank you so much


Raghuram, Rick

Following Rick's repost of his previous advice I realise that I'm not clear about the required output - I'd assumed that each filename needed to be in a separate cell rather than just remove the unnecessary characters. I can see how your solution works Rick, but its now time to go home for the weekend, so I offer my solutio, which appears somewhat inelegant, but extracts each into a separate row:

Code:
Sub parseFile2()
    Dim nn As Integer
    Dim mm As Integer
    Dim outputRow As Integer
    Dim outputCol As Integer
    Dim srce As String
    
    outputRow = ActiveCell.Row ' used to send the output to the rows immediately to the right and below the active cell
    outputCol = ActiveCell.Column + 1 'used to send the output to the column to the right of the active cell.
    srce = UCase(ActiveCell)
    
    For nn = 1 To Len(srce)
        If Mid(srce, nn, 1) = " " Or Mid(srce, nn, 1) = Chr(10) Then
            mm = nn + 1
            
            While Mid(srce, mm, 1) Like "[.0-9A-Z_]"
                If Mid(srce, mm, 1) = "." Then
                    mm = mm + 1
                    While Mid(srce, mm, 1) Like "[A-Z]"
                        mm = mm + 1
                    Wend
                    If Val(Mid(srce, mm + 3 + InStr(Mid(srce, mm), "GEN "))) <> 0 Then
                        Cells(outputRow, outputCol) = Mid(srce, nn + 1, mm - nn - 1) & "/GEN=" & Trim(Val(Mid(srce, mm + 3 + InStr(Mid(srce, mm), "GEN "))))
                        outputRow = outputRow + 1
                    End If
                End If
                mm = mm + 1
            Wend
        End If
    Next
End Sub

Have a great weekend.

Regards
 
Upvote 0
And you have assumed correctly... I just ran the "solution" from the other thread and that is what it did. So, for the sake of completeness, here is my code modified to do that...
Code:
Sub LatencyReport()
  Dim X As Long, Latency As String, GEN() As String
  Latency = Replace(Range("A1"), vbCr, "")
  Latency = Mid(Latency, InStr(InStr(Latency, vbLf) + 2, Latency, vbLf) + 1)
  Latency = Replace(Latency, " GEN ", "/GEN=")
  GEN = Split(Latency, "/GEN=")
  For X = 0 To UBound(GEN) - 1
    Do While Right(GEN(X), 1) Like "[!A-Za-z]"
      GEN(X) = Left(GEN(X), Len(GEN(X)) - 1)
    Loop
  Next
  Latency = Join(GEN, "/GEN=")
  Range("B1").Resize(UBound(GEN)) = Application.Transpose(Split(Latency, vbLf))
End Sub

Thank you Rick Rothstein, you have gone a step ahead and helped me. Truly I'm thank full to you. And Please excuse me for posting in other forum.

Thank you
 
Upvote 0
Thank you Rick Rothstein, you have gone a step ahead and helped me. Truly I'm thank full to you. And Please excuse me for posting in other forum.

You are welcome. As for posting in the other forum... there is nothing wrong with doing that HOWEVER to be fair to the people you are asking to help you, mention that you have posted the same question elsewhere and give a link to the message you posted; that way, people reading your question in one forum can look in the other forum to see if your question has been answered yet before they invest their valuable time developing their own solution for your question. There is a side benefit for you in doing this, by the way... let's say you get what you think is the answer to your question, but that it is flawed in some way... someone from the other forum looking at the solution you say you got may see the flaw and comment on it... if you don't provide the cross-post link, that person would never know of the other solution and, hence, would never alert you to the flaw in the code you chose to use.
 
Upvote 0

Forum statistics

Threads
1,216,027
Messages
6,128,375
Members
449,445
Latest member
JJFabEngineering

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top