CsJHUN
Active Member
- Joined
- Jan 13, 2015
- Messages
- 360
- Office Version
- 365
- 2021
- 2019
- Platform
- Windows
- Mobile
Hi guys,
I have a 26 column, 30k rows base table
30k rows belong to 106 doc
There is duplication in 30k rows (concatenate each column for each row) Removing duplication not work, always miss ~100 rows, and there is duplication within that 100.
I tried the built-in remove duplicates, used macro and used arrays to remove duplicated items. Looks like not working (11925 unique row) or the next step missing something
The concatenate cell lenght (as value) is over 500, some even over 1k character long
I want to create a matrix, with unique rows and docs on column.
Here is a sample from start
(30k rows, 26 column)
And the result
(~11k row, 106 column)
step i tried:
- create the 30k rows from the 106 doc (each separate xlsx file)
- concatenate the data for each row (its a must, imho, thats makes it unique)
- create unique list of docs (106 pcs)
- create unique list of the 30k row. (~11k pcs)
- find the 30k row if its fit inside the 11k, which is not! some of the 30k has 0 result on 11k
- mark the 1st occurence of the 11k in each column of the 106 docs
Any advice is welcome.
Cheers
J
I have a 26 column, 30k rows base table
30k rows belong to 106 doc
There is duplication in 30k rows (concatenate each column for each row) Removing duplication not work, always miss ~100 rows, and there is duplication within that 100.
I tried the built-in remove duplicates, used macro and used arrays to remove duplicated items. Looks like not working (11925 unique row) or the next step missing something
The concatenate cell lenght (as value) is over 500, some even over 1k character long
I want to create a matrix, with unique rows and docs on column.
Here is a sample from start
start | |
Item | doc |
a | 1 |
b | 1 |
c | 1 |
b | 2 |
c | 2 |
d | 2 |
a | 3 |
d | 3 |
(30k rows, 26 column)
And the result
Result | |||
Item / Doc | 1 | 2 | 3 |
a | x | x | |
b | x | x | |
c | x | x | |
d | x | x | |
(~11k row, 106 column)
step i tried:
- create the 30k rows from the 106 doc (each separate xlsx file)
- concatenate the data for each row (its a must, imho, thats makes it unique)
- create unique list of docs (106 pcs)
- create unique list of the 30k row. (~11k pcs)
- find the 30k row if its fit inside the 11k, which is not! some of the 30k has 0 result on 11k
- mark the 1st occurence of the 11k in each column of the 106 docs
Any advice is welcome.
Cheers
J