Needing some assistance with parsing a very large csv with a lot of rows & information.
Information on the CSV File
I have a CSV File that gets updated daily with numerous lines. The csv itself contains around 80 000 lines, with around 91 header rows. I know a massive file and not a file that is very nice to work with.
What I would like to achieve is the following.
Remove the header rows & respective column information that I don’t need in order to make the file “workable” therefore reducing from having 91 headers to approx. 7
@Jon I had though about doing that but correct me if I am wrong but for me to extract the information I need I would first need to read the file content and that is where the issue starts. There is simply to much data to be read through for Docker to handle. my rough maths is 81 column with 80 000 rows equates to around 6.4 million fields with text. If you know of another way I am all ears.
You are not wrong there but no matter what you do the data will still have to be read, When you try to open the file does it show any errors I would imagine in the UI it will be very slow to show the data but when running in the background it should be ok assuming the resources are set correctly for what you are doing.
Outside of this though… How did the file get so large and have you thought about using a database or something like Baserow instead?
Thanks for the suggestion and you are correct with the database and that’s exactly what I’m trying to do in getting all this information into a database, which is proving to be difficult. The issue is I need this to be done daily as the file I receive is sent from an external source and contains information I need to parse daily.