Convert PDF with PQ: splits text into two rows

JuliaGr

New Member
Joined
Sep 21, 2021
Messages
2
Hello,

I am converting PDF with Power Query and run into the problem.
The conversion separates text into two rows for the column "details" and keeps the rest columns blank.
The issue is not regular; sometimes it appears after every row, other times after couple of rows.

How may combine this rows?

Thank you in advance!

PageDateDR CRDetailsAmount
1
26/08/2021​
CPAYMENT FROM
1000​
BOB
1
26/08/2021​
CPAYMENT FROM
140​
BOB
1
30/08/2021​
CPAYMENT FROM
15​
BOB
1
30/08/2021​
CREAL TIME TRANSFER FROM
300​
BOB
2
31/08/2021​
DBALANCE BROUGHT FORWARD
0​
2
31/08/2021​
CREAL TIME TRANSFER FROM
370​
CAPITEC
2
31/08/2021​
DMONTHLY FEE
-100​
ACC
2
04/09/2021​
CCREDIT TRANSFER
120​
2
04/09/2021​
CREAL TIME TRANSFER FROM
350​
 

Excel Facts

Which Excel functions can ignore hidden rows?
The SUBTOTAL and AGGREGATE functions ignore hidden rows. AGGREGATE can also exclude error cells and more.
If there is always a Page number ([Page]<>null) on the rows you want to keep, try:

Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    tblAddIndex = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
    tblAddCustom = Table.AddColumn(tblAddIndex, "Custom", each try if tblAddIndex[Page]{_[Index]+1} = null then 
        _[Details] & " " & tblAddIndex[Details]{_[Index]+1} 
        else if [Page]<>null then [Details] else null otherwise [Details]),
    tblRemoveNulls = Table.SelectRows(tblAddCustom, each [Custom] <> null),
    tblRemoveOrigDetails = Table.RemoveColumns(tblRemoveNulls,{"Details","Index"}),
    tblRenameCustom = Table.RenameColumns(tblRemoveOrigDetails,{{"Custom", "Details"}}),
    tblReorder = Table.ReorderColumns(tblRenameCustom,Table.ColumnNames(Source))
in
    tblReorder

Book5
ABCDEFGHIJK
1PageDateDR CRDetailsAmountPageDateDR CRDetailsAmount
2126/08/2021CPAYMENT FROM1000126/08/2021CPAYMENT FROM BOB1000
3BOB126/08/2021CPAYMENT FROM BOB140
4126/08/2021CPAYMENT FROM140130/08/2021CPAYMENT FROM BOB15
5BOB130/08/2021CREAL TIME TRANSFER FROM BOB300
6130/08/2021CPAYMENT FROM15231/08/2021DBALANCE BROUGHT FORWARD0
7BOB231/08/2021CREAL TIME TRANSFER FROM CAPITEC370
8130/08/2021CREAL TIME TRANSFER FROM300231/08/2021DMONTHLY FEE ACC-100
9BOB204/09/2021CCREDIT TRANSFER120
10231/08/2021DBALANCE BROUGHT FORWARD0204/09/2021CREAL TIME TRANSFER FROM350
11231/08/2021CREAL TIME TRANSFER FROM370
12CAPITEC
13231/08/2021DMONTHLY FEE-100
14ACC
15204/09/2021CCREDIT TRANSFER120
16204/09/2021CREAL TIME TRANSFER FROM350
Sheet1
 
Upvote 0
Solution
Here's another way, it's shorter but may not be faster:

Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    tcn = Table.ColumnNames(Source),
    col = List.PositionOf(tcn,"Details"),
    lstCombine = List.Generate(()=> 0, (x)=> x < Table.RowCount(Source), (x)=> x +1, 
        (x) => Source[Details]{x} & (try if Source[Page]{x+1} = null then " " & Source[Details]{x+1} else "" otherwise "")),
    tblFromColumns = let ttc = Table.ToColumns(Source) in Table.FromColumns(List.FirstN(ttc,col) & {lstCombine} & List.Skip(ttc,col+1),tcn),
    tblOut = Table.SelectRows(tblFromColumns,each ([Page] <> null))
in
    tblOut
 
Upvote 0
Not knowing how all the data looks like, one caveat you have to worry about that using the Index approach could have a performance issue on larger datasets.
Again, without seeing the rest of the data, here's another solution:

Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Added Conditional Column" = Table.AddColumn(Source, "Custom", each if [Page] = null then [Details] else null),
    #"Filled Up" = Table.FillUp(#"Added Conditional Column",{"Custom"}),
    #"Merged Columns" = Table.CombineColumns(#"Filled Up",{"Details", "Custom"},Combiner.CombineTextByDelimiter(" ", QuoteStyle.None),"Details"),
    #"Filtered Rows" = Table.SelectRows(#"Merged Columns", each ([Page] <> null)),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Page", "Date", "DR CR", "Details", "Amount"})
in
    #"Removed Other Columns"
 
Upvote 0

Forum statistics

Threads
1,215,006
Messages
6,122,666
Members
449,091
Latest member
peppernaut

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top