Журнал имеет следующий формат: р>
Код: Выделить всё
201.179.162.179 - - [17/Sep/2019:06:30:49 -0300] "teSubmit=Save" 400 0 "-" "-"
201.179.162.179 - - [17/Sep/2019:06:30:49 -0300] "POST /cgi-bin/ViewLog.asp HTTP/1.1" 404 0 "-" "Ankit"
80.95.44.9 - - [17/Sep/2019:06:31:55 -0300] "GET / HTTP/1.1" 200 12101 "http://netlab.ice.ufjf.br/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
50.31.26.18 - - [17/Sep/2019:06:32:14 -0300] "GET /wp-login.php HTTP/1.1" 200 1514 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
50.31.26.18 - - [17/Sep/2019:06:32:14 -0300] "POST /wp-login.php HTTP/1.1" 200 1897 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
Код: Выделить всё
file = open('access.txt')
lines = file.readlines()
logs = pd.DataFrame(columns=['ip', 'indentd', 'userid', 'time', 'request', 'status', 'size', 'Referer', 'User_agent'])
regc = re.compile('(?P.*?) - - \[(?P.*?)\] "(?P.*?)" (?P\d+) (?P\d+) (?P.*?) (?P.*?)')
for line in lines:
m = regc.match(line)
print(m)
ip = m.group('ip')
identd = m.group('identd')
userid = m.group('userid')
time = m.group('time')
request = m.group('request')
status = m.group('status')
size = m.group('size')
Referer = m.group('Referer')
User_agent = m.group('User_agent')
logs.append([ip, identd, userid, time, request, status, size, Referer, User_agent])
logs
Подробнее здесь: https://stackoverflow.com/questions/585 ... apache-log
Мобильная версия