Analysts often depend on log files to dive deeper into the issues that break systems. Handling huge log files that are more than 10GB line by line and seeking the RCA is hectic as well as time-consuming.
Here is one way to get unread lines from a large file where logs are being added every second from the server.
A text file linecount.txt stores the number of lines that are read from logfile.txt. So for example, if there are 5000 lines in logfile.txt and it is being read for the first time. There will be 0 stored in linecount.txt as the file was never read before. When the code is run, it reads from the first line to 4999 lines and stores the count “5000” in the text file linecount.txt. Post this if 100 more lines are added to logfile.txt, and when we run the code again it reads from 5000th line to 5099 line and stores the latest count i.e, 5100.
#checks if linecount.txt file exists.
#if it doesn't exist a file is created and 0 is stored.
linecountpath = "D:\\linecount.txt"
fileA = "D:\\logfile.txt"
my_file = Path(linecountpath)
if my_file.is_file():
#donothing
else:
with open (linecountpath, "w") as lc:
lc.write("0")
#read lines from file A from nth line
with open(fileA, 'r', errors='ignore') as f:
with open (linecountpath, "r") as lc:
b=lc.read()
for line in islice(f, n, None):
line = line.replace('<br />')
#do what you want with unreadlines here.
#getting the nth line number form a large file
def length_file(filename):
with open(filename) as f:
for i, l in enumerate(f):
pass
return i + 1
#writing line count into linecount.txt
with open (linecountpath, "r+") as lc:
if os.stat(linecountpath).st_size == 0:
lc.truncate(0)
lc.seek(0)
lc.write("0")
else:
i = length_file(fileA)
lc.truncate(0)
lc.seek(0)
lc.write(str(i))
There are many other efficient ways to open a large file. You can check few ways in the below link https://stackoverflow.com/questions/845058/how-to-get-line-count-of-a-large-file-cheaply-in-python