Панды: удалите повторяющиеся, но последовательные строки и сохраните первую строку в группе. ⇐ Python
Панды: удалите повторяющиеся, но последовательные строки и сохраните первую строку в группе.
I have a df as below:
df = pd.DataFrame({ 'ID': ['James', 'James', 'James', 'James', 'Max', 'Max', 'Max', 'Max', 'Max', 'Park', 'Park','Park', 'Park', 'Tom', 'Tom', 'Tom', 'Tom'], 'From_num': [578, 420, 420, 'Started', 298, 78, 36, 298, 'Started', 28, 28, 311, 'Started', 60, 520, 99, 'Started'], 'To_num': [96, 578, 578, 420, 36, 298, 78, 36, 298, 112, 112, 28, 311, 150, 60, 520, 99], 'Date': ['2020-05-12', '2020-02-02', '2020-02-01', '2019-06-18', '2019-08-26', '2019-06-20', '2019-01-30', '2018-10-23', '2018-08-29', '2020-05-21', '2020-05-20', '2019-11-22', '2019-04-12', '2019-10-16', '2019-08-26', '2018-12-11', '2018-10-09']}) And I wish to drop only the CONSECUTIVE row (ignore the 'Date' field) within each ID group, for example line 1 and 2 have the same value, and wish to drop the 2nd duplicate, same as line 9 and line 10, drop line 10, the df is like this:
ID From_num To_num Date 0 James 578 96 2020-05-12 1 James 420 578 2020-02-02 2 James 420 578 2020-02-01 # Drop the this duplicated row (ignore date) 3 James Started 420 2019-06-18 4 Max 298 36 2019-08-26 5 Max 78 298 2019-06-20 6 Max 36 78 2019-01-30 7 Max 298 36 2018-10-23 8 Max Started 298 2018-08-29 9 Park 28 112 2020-05-21 10 Park 28 112 2020-05-20 # Drop this duplicate row (ignore date) 11 Park 311 28 2019-11-22 12 Park Started 311 2019-04-12 13 Tom 60 150 2019-10-16 14 Tom 520 60 2019-08-26 15 Tom 99 520 2018-12-11 16 Tom Started 99 2018-10-09 I wrote loop conditions, but it is very redundant and slow, I assume there might be easier way to do this, so please help if you have ideas. Great thanks. The expected result is like this, please be aware that there are also two NON-consecutive values in Max, line 4 and 7, I wish to keep them both:
ID From_num To_num Date 0 James 578 96 2020-05-12 1 James 420 578 2020-02-02 2 James Started 420 2019-06-18 3 Max 298 36 2019-08-26 4 Max 78 298 2019-06-20 5 Max 36 78 2019-01-30 6 Max 298 36 2018-10-23 7 Max Started 298 2018-08-29 8 Park 28 112 2020-05-21 9 Park 311 28 2019-11-22 10 Park Started 311 2019-04-12 11 Tom 60 150 2019-10-16 12 Tom 520 60 2019-08-26 13 Tom 99 520 2018-12-11 14 Tom Started 99 2018-10-09
Источник: https://stackoverflow.com/questions/630 ... thin-group
I have a df as below:
df = pd.DataFrame({ 'ID': ['James', 'James', 'James', 'James', 'Max', 'Max', 'Max', 'Max', 'Max', 'Park', 'Park','Park', 'Park', 'Tom', 'Tom', 'Tom', 'Tom'], 'From_num': [578, 420, 420, 'Started', 298, 78, 36, 298, 'Started', 28, 28, 311, 'Started', 60, 520, 99, 'Started'], 'To_num': [96, 578, 578, 420, 36, 298, 78, 36, 298, 112, 112, 28, 311, 150, 60, 520, 99], 'Date': ['2020-05-12', '2020-02-02', '2020-02-01', '2019-06-18', '2019-08-26', '2019-06-20', '2019-01-30', '2018-10-23', '2018-08-29', '2020-05-21', '2020-05-20', '2019-11-22', '2019-04-12', '2019-10-16', '2019-08-26', '2018-12-11', '2018-10-09']}) And I wish to drop only the CONSECUTIVE row (ignore the 'Date' field) within each ID group, for example line 1 and 2 have the same value, and wish to drop the 2nd duplicate, same as line 9 and line 10, drop line 10, the df is like this:
ID From_num To_num Date 0 James 578 96 2020-05-12 1 James 420 578 2020-02-02 2 James 420 578 2020-02-01 # Drop the this duplicated row (ignore date) 3 James Started 420 2019-06-18 4 Max 298 36 2019-08-26 5 Max 78 298 2019-06-20 6 Max 36 78 2019-01-30 7 Max 298 36 2018-10-23 8 Max Started 298 2018-08-29 9 Park 28 112 2020-05-21 10 Park 28 112 2020-05-20 # Drop this duplicate row (ignore date) 11 Park 311 28 2019-11-22 12 Park Started 311 2019-04-12 13 Tom 60 150 2019-10-16 14 Tom 520 60 2019-08-26 15 Tom 99 520 2018-12-11 16 Tom Started 99 2018-10-09 I wrote loop conditions, but it is very redundant and slow, I assume there might be easier way to do this, so please help if you have ideas. Great thanks. The expected result is like this, please be aware that there are also two NON-consecutive values in Max, line 4 and 7, I wish to keep them both:
ID From_num To_num Date 0 James 578 96 2020-05-12 1 James 420 578 2020-02-02 2 James Started 420 2019-06-18 3 Max 298 36 2019-08-26 4 Max 78 298 2019-06-20 5 Max 36 78 2019-01-30 6 Max 298 36 2018-10-23 7 Max Started 298 2018-08-29 8 Park 28 112 2020-05-21 9 Park 311 28 2019-11-22 10 Park Started 311 2019-04-12 11 Tom 60 150 2019-10-16 12 Tom 520 60 2019-08-26 13 Tom 99 520 2018-12-11 14 Tom Started 99 2018-10-09
Источник: https://stackoverflow.com/questions/630 ... thin-group
-
- Похожие темы
- Ответы
- Просмотры
- Последнее сообщение