I had to merge several wordclouds into a larger wordcloud based off all keywords and counts, and I only had access to the textfiles containing the keyword and the keyword count, not a list with all the keywords.
In order to be able to do a re-count and to build the new list, I needed to create a list containing the exact count of each keyword.
Luckily the structure was the same for all pairs, and since it was only one token (keyword) in each pair it could be done in one loop.
An alternative way to solve this problem, could be to build a “dict” of tokens and add upp the count each new time a matching token was found.
You can see the code below.
__author__ = 'Miklas Njor - iAmGoldenboy - https://miklasnjor.com' """ Converting a word-cloud laid out as an unstructured list of keywords with associated counts, into a list including keywords x counts, also taking into account duplicate entries """ from collections import Counter # a string in the form of "keyword count keyword count..." # note the duplicate entries. wordcloud_keywords_n_count = "apples 12 oranges 10 bananas 10 bananas 8 bananas 50 chairs 4 boats 3 orange 1 apple 1 banana 1" def wordcloudStringToList(wordcloudString): """ Converting a word-cloud laid out as an unstructured list of keywords with associated counts, into a list including keywords x counts, also taking into account duplicate entries :param wordcloudString: A string in the form of "keyword count keyword count"... :return: """ full_list =  # a list to collect all the keywords. split_wordcloud = wordcloudString.split(" ") # splitting string. for items in range(0, len(split_wordcloud)): try: keyword = split_wordcloud[items] # get the keyword . keyword_count = split_wordcloud[items+1] # get the keyword 's count. if keyword_count.isnumeric(): # when we hit a number... for times in range(int(keyword_count)): # ... add the keyword times keyword_count to the list . full_list.append( keyword ) except IndexError: pass return full_list print (wordcloudStringToList(wordcloud_keywords_n_count)) # sanity check print(Counter(wordcloudStringToList(wordcloud_keywords_n_count)))