PYTHON - DATA PROCESSING
Project Presentation
As part of my first year of the BUT Networks & Telecommunications at the IUT of Annecy, I completed an individual project in Python titled SAE 105: Data Processing. The objective was to manipulate, process, and present data from a CSV file containing more than 36,000 rows and 27 columns on French cities.
Project Objective
- Develop a Python program that adheres to precise specifications.
- Learn to read and process large CSV files.
- Implement an interactive menu offering different features (statistics, distances between cities, mapping, etc.).
- Produce graphical visualizations (histograms, interactive maps with Folium).
- Get used to working independently with regular deliverables
Project Process and Accomplishments
Skills Used
- Python programming: loops, functions, list management, sorting (bubble sort), interactive menus.
- Data processing: information extraction and cleaning, statistical calculations (mean, standard deviation, population growth).
- Visualization: using matplotlib, folium, and branca libraries to plot maps and graphs.
- Project management: respecting deadlines, working autonomously, delivering progressive results.
- Analytical mindset: structuring a complex problem into clear steps.
Features Developed
- Data extraction from the CSV file (cities, departments, populations, density, GPS coordinates, altitudes).
Statistics:
- 5 most/least populated cities in a department.
- 10 cities with the highest/lowest density.
- Cities with the largest population increase or decline between 1999 and 2012.
- Cartographic visualization: displaying cities on OpenStreetMap with circles proportional to their density/population.
- Histogram: distribution of cities according to their number of inhabitants in 2010.
- Distances: calculation of the Euclidean distance between two cities (e.g., Paris – Marseille).
- Pathfinding algorithm: searching for a path between two cities using geographical proximity.
Tools Used
- Python 3
- Libraries: matplotlib, folium, branca
- CSV file: villes_france.csv (≈ 36,700 rows)
- Moodle: submission and tracking of deliverables
Difficulties Encountered
- Manipulating a large file (processing optimization).
- Implementing sorting and statistical calculation algorithms using only concepts seen in class.
- Using new libraries like Folium to display interactive maps.
- Time management with frequent submissions for each session.
Overall Project Conclusion
SAE 105 was a formative experience that allowed me to put my knowledge of Python programming and data processing into practice. I learned to manage a large file, extract useful information, and represent it through statistics, graphs, and interactive maps. This project also taught me to work independently, follow a set of specifications, and overcome technical difficulties. Ultimately, I was able to create a complete and structured program, which is a significant first experience in the field of data processing and analysis applied to networks.