Hi,
I have a large dataset of text and I'm trying to find the co-occurance of words. Alterantive combinations of words are in two different columns. For instance, the column A has a co-word "apple_orange" and the column B of the same row has its flip co-word "organge_apple" and both are equivalent. This also means that all the values in column A are present somewhere in Column B and vice versa, but in different rows. For instance, consider the following:
<tbody>
</tbody>I need to accurately identify and remove all the duplicate rows, whereby the duplicates of Column A reside in Column B and vice versa. This means that half of the rows in the matrix have to be removed but the challenge is how to identify which rows to be removed. For instance, in the above example, rows 1 and 4 are duplicates, rows 2 and 7 are duplicates, and so forth, and need to be removed.
I have tried different formulae and techniques but failed. Any help would be highly appreciated.
Best regards,
guest2013
I have a large dataset of text and I'm trying to find the co-occurance of words. Alterantive combinations of words are in two different columns. For instance, the column A has a co-word "apple_orange" and the column B of the same row has its flip co-word "organge_apple" and both are equivalent. This also means that all the values in column A are present somewhere in Column B and vice versa, but in different rows. For instance, consider the following:
Col A | Col B |
apple_fruit | fruit_apple |
apple_mango | mango_apple |
apple_orange | orange_apple |
fruit_apple | apple_fruit |
juice_mango | mango_juice |
juice_orange | orange_juice |
mango_apple | apple_mango |
mango_juice | juice_mango |
orange_apple | apple_orange |
orange_juice | juice_orange |
<tbody>
</tbody>
I have tried different formulae and techniques but failed. Any help would be highly appreciated.
Best regards,
guest2013