I wanted to analyze some chat-logs I got from a twitch stream (with permission from the streamer).
The chat messages are all saved in files named after the day they were sent, and all of these files are in one single folder. I built a short script that reads through all of them, line by line, splits up the words that the user wrote and counted how many times a specific word was said (by any user). So basically "word appearences" of each word over all the days he/she was streaming.
I first wrote it in Python3, but then I wondered how NodeJS, the tool with which I capture the chat messages, would compare to it.
TL;DR: Both have approximately the same speed. ( 8.4 seconds for 74 files with 2 million lines, +/- 100ms )
Both the NodeJS and Python3 program read each file in, split the lines, then split the words that the user wrote and just add them to a dictionary.
The index of that dictionary is the word and the value of it the number of appearences. To sort this dictionary I just add the index and value, as a list, to a list, and sort it by the value aka. the second item of the list.
To further elaborate this:
File-Lines: 19:37 - UserName - Chat Message Chat
Dictionary: stats["Message"] = 1
Sort-List: list = [["Message",1],["Chat",2]]
Then the List will get sorted by the second value (The count of the word appearence) and written as a JSON string to a file.
This particular test-case was made from 74 days worth of chat logs, which range from 20KB to 4MB in size.
This accumulated to over 2 Million lines of chat messages.
2 Million lines read in, split into words, counted in a dictionary, put into a list, sorted the list and dumped it as a .json.
The test result:
[user@home testcase]$ time python3 speedtest.py streamername Starting to count... Finished counting. Appending... Finished appending. Starting the sorting... Sorting finished. Starting to write to files... Writing finished. Files: 74 Lines: 2023748 real 0m8.338s user 0m7.773s sys 0m0.545s [user@home testcase]$ [user@home testcase]$ time node speedtest.js streamername Starting to count... Finished counting. Appending... Finished appending. Starting the sorting... Sorting finished. Starting to write to files... Writing finished. Files: 74 Lines: 2023748 real 0m8.468s user 0m8.045s sys 0m0.446s [user@home testcase]$