Код: Выделить всё
[
{ "student_id": 1234,
"room_id": "abc",
"enrolled": false
},
{ "student_id": 4321,
"room_id": "def",
"enrolled": true,
"enrollment": {
"type": "home",
"date": "01-01-2020"
}
},
{ "student_id": 678,
"room_id": "htf",
"sports": {
"team": "hockey",
"position": "forward"
}
]
< /code>
Я могу частично его сгладить, выполнив: < /p>
df = sc.parallelize(data).map(lambda x: json.dumps(x))
student_id
room_id
enrolled
enrollment
sports
1234
abc
false
NULL
null < /td>
< /tr>
4321 < /td>
def < /td>
true < /td>
{Home, 01-01-2020} < /td>
{Home, 01-01-2020} < /td>
{Home, 01-01-2020} /> < /tr>
678 < /td>
htf < /td>
null < /td>
null < /td>
{hockey, strival} < /td>
/>
How can I flatten this further to get:
student_id
room_id
enrolled
type < /th>
date < /th>
team < /th>
position < /th>
< /tr>
< /thead>
1234 < /td>
1234 < /td>
/> false < /td>
null < /td>
null < /td>
null < /td>
null < /td>
< /tr>
4321 < /td>
4321 < /td> /> def < /td>
true < /td>
home < /td>
01-01-2020 < /td>
null < /td>
null < /td>
< /tr>
null < /td>
< /tr>
678 < /td>
htf < /td>
null < /td>
null < /td>
функциональный />
Подробнее здесь: https://stackoverflow.com/questions/794 ... in-pyspark