So I’m currently learning how to code and working on a project for my university. It’s supposed to allow transit routing with realtime data.
The transit router has a repository with all the needed data on how public transport is scheduled to drive. From here I need the StopTime list, which contains around 600.000 entries. A StopTime is basically the combination of one vehicle doing one stop on one trip at a specific time. On the other hand I get a list of about 30.000 entries with realtime data, which aren’t StopTimes.
Now I have to compare the variables of every entry of both lists to find a matching stopID and tripID. These are variables in both StopTimes and realtime entries.
Currently I’m using the following lines to compare the variables:
for (StopTime s: times)
for (GTFSRealtimeEntry g: rtTimes)
if (Objects.equals(g.getTripId(), s.trip.id) && Objects.equals(g.getStopId(), s.stop.id))
However this takes ages to compute. Is there a quicker way to do this than going through two for-loops?
I’ve tried replacing the Object.equals with ==, but that didn’t help. I’ve also tried to split up the comparing into smaller chunks (around 250 StopTimes at a time), but that didn’t improve performance either.
We need details. What are your data types? Give a simplified version of the source code of your domain classes. Give some example data.
It would help to have more details. We can infer
times
is either aStopTime []
or some sort ofCollection<StopTime>
. From the question header, we suppose it is aList<StopTime>
. We can draw the same inferences forrtTimes
. But, we shouldn’t have to: The question should include code showing that. More importantly, it would help if we had code showing (the relevant) fields and methods ofStopTime
andGTFSRealtimeEntry
. Can you provide a minimal reproducible example?If you have working code, but need to improve it, the question might be suitable for Code Review. Check out the On-Topic and How to Ask for Code Review. (Links at the top of the tour page.)
Since no updating for the StopTime list, you may create separate thread for entries of StopTime to search their occurrences in the other list. Maybe not for each entry but each thread may search for ~1000 entries or something. Searching in parallel may improve the searching time.
Also if the data is temporal data, it might already sorted by time, then you can do binary search for the temporal data, where you split and search within specific time frame. This will reduce the search complexity.