Split the content of a cell into separate rows

Bering

Board Regular
Joined
Aug 22, 2018
Messages
185
Office Version
  1. 2016
Platform
  1. Windows
Hello,

I have to review a reconciliation file similar to the one below. The problem is that in column 5 (Comment) there could be cells that include a number of items, which makes it impossible for me to filter/analyse the data, (i.e. comments for item #3 ).

Whenever a comment contains several items, each item is identified by a consecutive number followed by a dot.

For example, for item 3, the comment provides a break-down of the -51 difference as:

1. 5 missing instruction
2. 15 incorrect posting,
3. 31 timing difference


ItemOur quantityTheir quantityDifferenceComment
12030-10Timing difference
230255Under investigation
3960-511. 5 missing instruction 2. 15 incorrect posting, 3. 31 timing difference
41010-No difference

<tbody>
</tbody>


Is there a way (vba, formulae, text to column..) to split the content of such cells into separate rows so that the result looks be something like this:


ItemOur quantityTheir quantityDifferenceComment
12030-10Timing difference
230255Under investigation
3960-511. 5 missing instruction
3960-512. 15 incorrect posting,
3960-513. 31 timing difference
41010-No difference

<tbody>
</tbody>

Thanks for any suggestions.
 
I was wondering about situations like that where you have other parts in the cell where digits are followed by a "." besides the numbered sections you are interested in.
A few comments/questions to try to clarify before the next suggestion.

1. ReDim b(1 To UBound(a) * 20, 1 To uba2)
Not that important but that 20 change is only needed if the average number of points per cell exceeds 10. So, if you only had that cell and one other cell which contained 3 numbered points (17 in total), the 10 would still have been enough.

2. Just confirming that what you have shown in post 10 is all in one cell?

3. If so, are the numbered points each on a separate line within the cell like you have shown? That is, is there an Alt+Enter (CHAR(10)) between each numbered point?
If not, is there always a space after the end of one 'sentence' before the next numbered point starts?

4. Can you confirm that there is always a space after the numberered point & its "." before the 'sentence' starts as shown in all of the post 10 lines?

5. I have also amended this (as few as possible) (.+?) to this (.+)
I think you have misunderstood how that works and it will likely get changed back.
 
Upvote 0

Excel Facts

Square and cube roots
The =SQRT(25) is a square root. For a cube root, use =125^(1/3). For a fourth root, use =625^(1/4).
Hi Peter,

Please find below my comments.

Also, if there is anything that could be helpful to simplify this, I could ask the provider of the file to make some changes (for example end each sentence with a special character...). Just let me know.


1. Thanks for the clarification: the file I need to review has many rows and several cells containing more than one item, so the average can easily exceed 10. I changed this the first time I debugged the code and that seemed to do the trick.

2. Correct, that would be the content of one cell. In the same file there could be several cells following the same pattern.

3. Nope, they are not in a separate line. Yes, there is always a space after the end of each sentence and before the next numbered item.

4. Yep, there is always a space after each number + dot, as in this list I have written, i.e. 1. fshghg, 2. eztrttiz 3. etc etc

5. Okie :p, looking forward for my book on Regex to arrive. I find this topic fascinating.

I hope this helps, many thanks!
 
Upvote 0
.. I could ask the provider of the file to make some changes (for example end each sentence with a special character...). Just let me know.
That would make things much simpler (eg # between points, or a line-break character)

As it stands, I haven't been able to come up with a fool-proof pattern to get what you want.

Since the points are not on separate lines, my approach was to look for where there was a space followed by 1 or more digits followed by a "." followed by a space
This is how each point starts (except the 1st one which is dealt with in a different way). However, as well as that pattern occurring where I have marked in blue (which are the starts of points), it also occurs where I have marked in red (which are not starts of points). As yet I can't see a logical way to automatically differentiate between the two circumstances.
Further, it looks like there may be other places where such a pattern could occur too. For example if the green P&L figure was only less than 1000, the pattern would have occurred there too.


...discrepancy of 3,05 4. 54.60 Incorrect commission posted as at 27/04/2018 SJ036309 to be reversed. 5. 48 P&L discrepancy. As per statement as at 04052018 there is a P&L of 453,815.98 but yet it is posted as 453,767.96. Hence this leaves a discrepancy of 48. 6. -25 P&L discrepancy. As per statement as at 31052018 there is a P&L of 56,845. But yet it is posted as 56,820. Hence this leaves a discrepancy of 25. 7. -16.89 P&L discrepancy. As per statement as at 13062018 there is a P&L of 45362.50 but yet it is posted as 45345.61. Hence this leaves a discrepancy of 16.89.
 
Last edited:
Upvote 0
No problem. Would something like this help?

1. -67.57 P&L discrepancy. As per statement for 280218 there is a P&L of 96015, yet it is posted in ICON as 95947.4 which leaves a discrepancy of 67,57#
2. -13.09 P&L Discrepancy. As per statement as at 14032018 there is a P&L of 519,725, yet it is posted in ICON as 519,738.09 which leaves a discrepancy of -13.09#
3. -3.05 Commission discrepancy. As per statement as at 20042018 there is a commission of 161.65 but yet it is posted as 158.60. Hence this leaves a discrepancy of 3,05#
4. 54.60 Incorrect commission posted as at 27/04/2018 SJ036309 to be reversed#
5. 48 P&L discrepancy. As per statement as at 04052018 there is a P&L of 453,815.98 but yet it is posted as 453,767.96. Hence this leaves a discrepancy of 48#
6. -25 P&L discrepancy. As per statement as at 31052018 there is a P&L of 56,845. But yet it is posted as 56,820. Hence this leaves a discrepancy of 25#
7. -16.89 P&L discrepancy. As per statement as at 13062018 there is a P&L of 45362.50 but yet it is posted as 45345.61. Hence this leaves a discrepancy of 16.89#
8. -3.85 Commission discrepancy. As per statement as at 11062018 there is a commission of 204.05 but yet it is posted as 200.20. Hence this leaves a discrepancy of 3.85#
9. 161.4 P&L discrepancy. As per statement as at 11072018 there is a P&L of 310,245 but yet it is posted as 310,406.40. Hence this leaves a discrepancy of 161.4#
10. 28.42 - P&L discrepancy. As per statement as at 180718 there is a P&L of 1,395, yet it is posted in ICON as 1,423.42. This leaves a discrepancy of 28.42#
11. -75.46 - P&L Discrepancy. As per statement as at 260718 there is a P&L of 87,620, yet it is posted as 87,695.46. This leaves a discrepancy of 75.46#
12. -113.90 - P&L Discrepancy. As per statement as at 030818 there is a P&L of 113,185, yet it is posted as 113,298.90. This leaves a discrepancy of 113.90#
13. 71.1 P&L Discrepancy. As per statement as at 07.18.18 there is a P&L of 23,780, yet it is posted as 23,708.9. This leaves a discrepancy of 71.1#
14. -71 - P&L Discrepancy as at 16.08.18. As per statement there is a P&L of 120,830, yet it is posted in ICON as 120,759. This leaves a discrepancy of 71#
 
Upvote 0
No problem. Would something like this help?
That would make things eminently simple (but not nearly as much fun ;)) as Regular Expressions would be consigned to the bin & the code becomes much simpler. :)
Code:
Sub Split_Rows_v2()
  Dim a As Variant, b As Variant, itm As Variant
  Dim i As Long, j As Long, k As Long, uba2 As Long
  
  a = Range("A1", Range("E" & Rows.Count).End(xlUp)).Value
  uba2 = UBound(a, 2)
  ReDim b(1 To UBound(a) * 20, 1 To uba2)
  For i = 1 To UBound(a)
    For Each itm In Split(a(i, uba2), "#")
      k = k + 1
      For j = 1 To uba2 - 1
        b(k, j) = a(i, j)
      Next j
      b(k, uba2) = Trim(itm)
    Next itm
  Next i
  Range("A" & Rows.Count).End(xlUp).Offset(3).Resize(k, uba2).Value = b
End Sub
 
Upvote 0
Thank you very much for all your help!!! :pray:

I now have 2 options:

1. have each sentence ending with a # character and use the code you provided above

or

2. have each sentence starting with a numberered point + a special character, say, 1* , 2*, instead of 1. , 2. I believe, that should be enough to create a "unique pattern" to be used with Regular Expressions.
I changed your code as follows and worked perfectly:

Rich (BB code):
Rich (BB code):
.Pattern = "(\d+\*)(.+?)(?=(\d+\*)|$)"


You cannot immagine how much you have helped me... going through those files was pure hell!





 
Upvote 0
I now have 2 options:
If it was me and I had the choice about how to indicate the numbered points, I would put the asterisk immediately before the point number, not after, like this

Excel Workbook
E
6*1. -67.57 P&L discrepancy. As per statement for 280218 there is a P&L of 96015, yet it is posted in ICON as 95947.4 which leaves a discrepancy of 67,57 *2. -13.09 P&L Discrepancy. As per statement as at 14032018 there is a P&L of 519,725, yet it is posted in ICON as 519,738.09 which leaves a discrepancy of 13.09- *3 -3.05 Commission discrepancy. As per statement as at 20042018 there is a commission of 161.65 but yet it is posted as 158.60. Hence this leaves a discrepancy of 3,05 *4. 54.60 Incorrect commission posted as at 27/04/2018 SJ036309 to be reversed. *5. 48 P&L discrepancy. As per statement as at 04052018 there is a P&L of 453,815.98 but yet it is posted as 453,767.96. Hence this leaves a discrepancy of 48. *6 -25 P&L discrepancy. As per statement as at 31052018 there is a P&L of 56,845. But yet it is posted as 56,820. Hence this leaves a discrepancy of 25. *7. -16.89 P&L discrepancy. As per statement as at 13062018 there is a P&L of 45362.50 but yet it is posted as 45345.61. Hence this leaves a discrepancy of 16.89. *8. -3.85 Commission discrepancy. As per statement as at 11062018 there is a commission of 204.05 but yet it is posted as 200.20. Hence this leaves a discrepancy of 3.85 *9. 161.4 P&L discrepancy. As per statement as at 11072018 there is a P&L of 310,245 but yet it is posted as 310,406.40. Hence this leaves a discrepancy of 161.4 *10. 28.42 - P&L discrepancy. As per statement as at 180718 there is a P&L of 1,395, yet it is posted in ICON as 1,423.42. This leaves a discrepancy of 28.42 *11. -75.46 - P&L Discrepancy. As per statement as at 260718 there is a P&L of 87,620, yet it is posted as 87,695.46. This leaves a discrepancy of 75.46 *12. -113.90 - P&L Discrepancy. As per statement as at 030818 there is a P&L of 113,185, yet it is posted as 113,298.90. This leaves a discrepancy of 113.90 *13. 71.1 P&L Discrepancy. As per statement as at 07.18.18 there is a P&L of 23,780, yet it is posted as 23,708.9. This leaves a discrepancy of 71.1 *14. -71 - P&L Discrepancy as at 16.08.18. As per statement there is a P&L of 120,830, yet it is posted in ICON as 120,759. This leaves a discrepancy of 71
Sample



Since regular expressions can be trickier to understand & maintain and can be more resource-hungry that some other options, I would then use this slightly modified version of the _v2 code suggested earlier

Code:
Sub Split_Rows_v3()
  Dim a As Variant, b As Variant, itm As Variant
  Dim i As Long, j As Long, k As Long, uba2 As Long
  
  a = Range("A1", Range("E" & Rows.Count).End(xlUp)).Value
  uba2 = UBound(a, 2)
  ReDim b(1 To UBound(a) * 20, 1 To uba2)
  For i = 1 To UBound(a)
    For Each itm In Split(a(i, uba2), "*")
      If Len(itm) > 0 Then
        k = k + 1
        For j = 1 To uba2 - 1
          b(k, j) = a(i, j)
        Next j
        b(k, uba2) = Trim(itm)
      End If
    Next itm
  Next i
  Range("A" & Rows.Count).End(xlUp).Offset(3).Resize(k, uba2).Value = b
End Sub


If you did really want to use regular expressions, having that marker in front of any numbered points would make the Pattern string much simpler:
(An asterisk followed by a non-asterisk 1 or more times)
Code:
.Pattern = "\*[^\*]+"

All of the above assumes that an asterisk will not appear anywhere else in the data apart from marking the start of a point you want a separate line for in the results.
 
Upvote 0
I will discuss both options with the provider to ensure that the requirements are fully understood and they will not mess things up, whichever we decide to implement.

Thanks a lot for all your help with this, much appreciated.

Have a great day!
 
Upvote 0
Hi Peter... I was wondering if you could help me again. Not sure if I should open a new thread...

I would like amend the code
Code:
.Pattern = "\*[^\*]+"
to include the scenario where there is an Alt+Enter (CHAR(10)) between each numbered point

Thank you.
 
Upvote 0

Forum statistics

Threads
1,214,527
Messages
6,120,054
Members
448,940
Latest member
mdusw

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top